828 resultados para compression parallel


Relevância:

20.00% 20.00%

Publicador:

Resumo:

IEEE International Symposium on Circuits and Systems, pp. 724 – 727, Seattle, EUA

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hyperspectral imaging has become one of the main topics in remote sensing applications, which comprise hundreds of spectral bands at different (almost contiguous) wavelength channels over the same area generating large data volumes comprising several GBs per flight. This high spectral resolution can be used for object detection and for discriminate between different objects based on their spectral characteristics. One of the main problems involved in hyperspectral analysis is the presence of mixed pixels, which arise when the spacial resolution of the sensor is not able to separate spectrally distinct materials. Spectral unmixing is one of the most important task for hyperspectral data exploitation. However, the unmixing algorithms can be computationally very expensive, and even high power consuming, which compromises the use in applications under on-board constraints. In recent years, graphics processing units (GPUs) have evolved into highly parallel and programmable systems. Specifically, several hyperspectral imaging algorithms have shown to be able to benefit from this hardware taking advantage of the extremely high floating-point processing performance, compact size, huge memory bandwidth, and relatively low cost of these units, which make them appealing for onboard data processing. In this paper, we propose a parallel implementation of an augmented Lagragian based method for unsupervised hyperspectral linear unmixing on GPUs using CUDA. The method called simplex identification via split augmented Lagrangian (SISAL) aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The efficient implementation of SISAL method presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The application of compressive sensing (CS) to hyperspectral images is an active area of research over the past few years, both in terms of the hardware and the signal processing algorithms. However, CS algorithms can be computationally very expensive due to the extremely large volumes of data collected by imaging spectrometers, a fact that compromises their use in applications under real-time constraints. This paper proposes four efficient implementations of hyperspectral coded aperture (HYCA) for CS, two of them termed P-HYCA and P-HYCA-FAST and two additional implementations for its constrained version (CHYCA), termed P-CHYCA and P-CHYCA-FAST on commodity graphics processing units (GPUs). HYCA algorithm exploits the high correlation existing among the spectral bands of the hyperspectral data sets and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. The proposed P-HYCA and P-CHYCA implementations have been developed using the compute unified device architecture (CUDA) and the cuFFT library. Moreover, this library has been replaced by a fast iterative method in the P-HYCA-FAST and P-CHYCA-FAST implementations that leads to very significant speedup factors in order to achieve real-time requirements. The proposed algorithms are evaluated not only in terms of reconstruction error for different compressions ratios but also in terms of computational performance using two different GPU architectures by NVIDIA: 1) GeForce GTX 590; and 2) GeForce GTX TITAN. Experiments are conducted using both simulated and real data revealing considerable acceleration factors and obtaining good results in the task of compressing remotely sensed hyperspectral data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the main problems of hyperspectral data analysis is the presence of mixed pixels due to the low spatial resolution of such images. Linear spectral unmixing aims at inferring pure spectral signatures and their fractions at each pixel of the scene. The huge data volumes acquired by hyperspectral sensors put stringent requirements on processing and unmixing methods. This letter proposes an efficient implementation of the method called simplex identification via split augmented Lagrangian (SISAL) which exploits the graphics processing unit (GPU) architecture at low level using Compute Unified Device Architecture. SISAL aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The proposed implementation is performed in a pixel-by-pixel fashion using coalesced accesses to memory and exploiting shared memory to store temporary data. Furthermore, the kernels have been optimized to minimize the threads divergence, therefore achieving high GPU occupancy. The experimental results obtained for the simulated and real hyperspectral data sets reveal speedups up to 49 times, which demonstrates that the GPU implementation can significantly accelerate the method's execution over big data sets while maintaining the methods accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parallel hyperspectral unmixing problem is considered in this paper. A semisupervised approach is developed under the linear mixture model, where the abundance's physical constraints are taken into account. The proposed approach relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. Since Libraries are potentially very large and hyperspectral datasets are of high dimensionality a parallel implementation in a pixel-by-pixel fashion is derived to properly exploits the graphics processing units (GPU) architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for real hyperspectral datasets reveal significant speedup factors, up to 164 times, with regards to optimized serial implementation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many Hyperspectral imagery applications require a response in real time or near-real time. To meet this requirement this paper proposes a parallel unmixing method developed for graphics processing units (GPU). This method is based on the vertex component analysis (VCA), which is a geometrical based method highly parallelizable. VCA is a very fast and accurate method that extracts endmember signatures from large hyperspectral datasets without the use of any a priori knowledge about the constituent spectra. Experimental results obtained for simulated and real hyperspectral datasets reveal considerable acceleration factors, up to 24 times.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a new parallel method for sparse spectral unmixing of remotely sensed hyperspectral data on commodity graphics processing units (GPUs) is presented. A semi-supervised approach is adopted, which relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. This method is based on the spectral unmixing by splitting and augmented Lagrangian (SUNSAL) that estimates the material's abundance fractions. The parallel method is performed in a pixel-by-pixel fashion and its implementation properly exploits the GPU architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for simulated and real hyperspectral datasets reveal significant speedup factors, up to 1 64 times, with regards to optimized serial implementation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a methodology for multi-objective day-ahead energy resource scheduling for smart grids considering intensive use of distributed generation and Vehicle- To-Grid (V2G). The main focus is the application of weighted Pareto to a multi-objective parallel particle swarm approach aiming to solve the dual-objective V2G scheduling: minimizing total operation costs and maximizing V2G income. A realistic mathematical formulation, considering the network constraints and V2G charging and discharging efficiencies is presented and parallel computing is applied to the Pareto weights. AC power flow calculation is included in the metaheuristics approach to allow taking into account the network constraints. A case study with a 33-bus distribution network and 1800 V2G resources is used to illustrate the performance of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer time-multiplexing and dynamic applications parallelism to enhance device utilization and reduce energy consumption at the cost of additional memory (up to 50% area of the overall platform). To reduce the memory overheads, novel CGRAs employ either statistical compression, intermediate compact representation, or multicasting. Each compaction technique has different properties (i.e. compression ratio, decompression time and decompression energy) and is best suited for a particular class of applications. However, existing research only deals with these methods separately. Moreover, they only analyze the compaction ratio and do not evaluate the associated energy overheads. To tackle these issues, we propose a polymorphic compression architecture that interleaves these techniques in a unique platform. The proposed architecture allows each application to take advantage of a separate compression/decompression hierarchy (consisting of various types and implementations of hardware/software decoders) tailored to its needs. Simulation results, using different applications (FFT, Matrix multiplication, and WLAN), reveal that the choice of compression hierarchy has a significant impact on compression ratio (up to 52%), decompression energy (up to 4 orders of magnitude), and configuration time (from 33 n to 1.5 s) for the tested applications. Synthesis results reveal that introducing adaptivity incurs negligible additional overheads (1%) compared to the overall platform area.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Euromicro Conference on Digital System Design (DSD 2015), Funchal, Portugal.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

6th Real-Time Scheduling Open Problems Seminar (RTSOPS 2015), Lund, Sweden.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The 30th ACM/SIGAPP Symposium On Applied Computing (SAC 2015). 13 to 17, Apr, 2015, Embedded Systems. Salamanca, Spain.