43 resultados para Compute Unified Device Architecture (CUDA)
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
Remote hyperspectral sensors collect large amounts of data per flight usually with low spatial resolution. It is known that the bandwidth connection between the satellite/airborne platform and the ground station is reduced, thus a compression onboard method is desirable to reduce the amount of data to be transmitted. This paper presents a parallel implementation of an compressive sensing method, called parallel hyperspectral coded aperture (P-HYCA), for graphics processing units (GPU) using the compute unified device architecture (CUDA). This method takes into account two main properties of hyperspectral dataset, namely the high correlation existing among the spectral bands and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. Experimental results conducted using synthetic and real hyperspectral datasets on two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN, reveal that the use of GPUs can provide real-time compressive sensing performance. The achieved speedup is up to 20 times when compared with the processing time of HYCA running on one core of the Intel i7-2600 CPU (3.4GHz), with 16 Gbyte memory.
Resumo:
The application of compressive sensing (CS) to hyperspectral images is an active area of research over the past few years, both in terms of the hardware and the signal processing algorithms. However, CS algorithms can be computationally very expensive due to the extremely large volumes of data collected by imaging spectrometers, a fact that compromises their use in applications under real-time constraints. This paper proposes four efficient implementations of hyperspectral coded aperture (HYCA) for CS, two of them termed P-HYCA and P-HYCA-FAST and two additional implementations for its constrained version (CHYCA), termed P-CHYCA and P-CHYCA-FAST on commodity graphics processing units (GPUs). HYCA algorithm exploits the high correlation existing among the spectral bands of the hyperspectral data sets and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. The proposed P-HYCA and P-CHYCA implementations have been developed using the compute unified device architecture (CUDA) and the cuFFT library. Moreover, this library has been replaced by a fast iterative method in the P-HYCA-FAST and P-CHYCA-FAST implementations that leads to very significant speedup factors in order to achieve real-time requirements. The proposed algorithms are evaluated not only in terms of reconstruction error for different compressions ratios but also in terms of computational performance using two different GPU architectures by NVIDIA: 1) GeForce GTX 590; and 2) GeForce GTX TITAN. Experiments are conducted using both simulated and real data revealing considerable acceleration factors and obtaining good results in the task of compressing remotely sensed hyperspectral data sets.
Resumo:
One of the main problems of hyperspectral data analysis is the presence of mixed pixels due to the low spatial resolution of such images. Linear spectral unmixing aims at inferring pure spectral signatures and their fractions at each pixel of the scene. The huge data volumes acquired by hyperspectral sensors put stringent requirements on processing and unmixing methods. This letter proposes an efficient implementation of the method called simplex identification via split augmented Lagrangian (SISAL) which exploits the graphics processing unit (GPU) architecture at low level using Compute Unified Device Architecture. SISAL aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The proposed implementation is performed in a pixel-by-pixel fashion using coalesced accesses to memory and exploiting shared memory to store temporary data. Furthermore, the kernels have been optimized to minimize the threads divergence, therefore achieving high GPU occupancy. The experimental results obtained for the simulated and real hyperspectral data sets reveal speedups up to 49 times, which demonstrates that the GPU implementation can significantly accelerate the method's execution over big data sets while maintaining the methods accuracy.
Resumo:
A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 x 4 and 2 x 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120x concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 x 7,680 @ 30 fps).
Resumo:
A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e. g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 x 4,320 at 30 fps) in real time.
Resumo:
A new high performance architecture for the computation of all the DCT operations adopted in the H.264/AVC and HEVC standards is proposed in this paper. Contrasting to other dedicated transform cores, the presented multi-standard transform architecture is supported on a completely configurable, scalable and unified structure, that is able to compute not only the forward and the inverse 8×8 and 4×4 integer DCTs and the 4×4 and 2×2 Hadamard transforms defined in the H.264/AVC standard, but also the 4×4, 8×8, 16×16 and 32×32 integer transforms adopted in HEVC. Experimental results obtained using a Xilinx Virtex-7 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which outperforms its more prominent related designs by at least 1.8 times. When integrated in a multi-core embedded system, this architecture allows the computation, in real-time, of all the transforms mentioned above for resolutions as high as the 8k Ultra High Definition Television (UHDTV) (7680×4320 @ 30fps).
Resumo:
A new high throughput and scalable architecture for unified transform coding in H.264/AVC is proposed in this paper. Such flexible structure is capable of computing all the 4x4 and 2x2 transforms for Ultra High Definition Video (UHDV) applications (4320x7680@ 30fps) in real-time and with low hardware cost. These significantly high performance levels were proven with the implementation of several different configurations of the proposed structure using both FPGA and ASIC 90 nm technologies. In addition, such experimental evaluation also demonstrated the high area efficiency of theproposed architecture, which in terms of Data Throughput per Unit of Area (DTUA) is at least 1.5 times more efficient than its more prominent related designs(1).
Resumo:
Conferência: IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors (ASAP)- Jun 05-07, 2013
Resumo:
This paper presents a new predictive digital control method applied to Matrix Converters (MC) operating as Unified Power Flow Controllers (UPFC). This control method, based on the inverse dynamics model equations of the MC operating as UPFC, just needs to compute the optimal control vector once in each control cycle, in contrast to direct dynamics predictive methods that needs 27 vector calculations. The theoretical principles of the inverse dynamics power flow predictive control of the MC based UPFC with input filter are established. The proposed inverse dynamics predictive power control method is tested using Matlab/Simulink Power Systems toolbox and the obtained results show that the designed power controllers guarantees decoupled active and reactive power control, zero error tracking, fast response times and an overall good dynamic and steady-state response.
Resumo:
This paper proposes a multifunctional architecture to implement field-programmable gate array (FPGA) controllers for power converters and presents a prototype for a pulsed power generator based on a solid-state Marx topology. The massively parallel nature of reconfigurable hardware platforms provides very high processing power and fast response times allowing the implementation of many subsystems in the same device. The prototype includes the controller, a failure detection system, an interface with a safety/emergency subsystem, a graphical user interface, and a virtual oscilloscope to visualize the generated pulse waveforms, using a single FPGA. The proposed architecture employs a modular design that can be easily adapted to other power converter topologies.
Resumo:
Hyperspectral imaging has become one of the main topics in remote sensing applications, which comprise hundreds of spectral bands at different (almost contiguous) wavelength channels over the same area generating large data volumes comprising several GBs per flight. This high spectral resolution can be used for object detection and for discriminate between different objects based on their spectral characteristics. One of the main problems involved in hyperspectral analysis is the presence of mixed pixels, which arise when the spacial resolution of the sensor is not able to separate spectrally distinct materials. Spectral unmixing is one of the most important task for hyperspectral data exploitation. However, the unmixing algorithms can be computationally very expensive, and even high power consuming, which compromises the use in applications under on-board constraints. In recent years, graphics processing units (GPUs) have evolved into highly parallel and programmable systems. Specifically, several hyperspectral imaging algorithms have shown to be able to benefit from this hardware taking advantage of the extremely high floating-point processing performance, compact size, huge memory bandwidth, and relatively low cost of these units, which make them appealing for onboard data processing. In this paper, we propose a parallel implementation of an augmented Lagragian based method for unsupervised hyperspectral linear unmixing on GPUs using CUDA. The method called simplex identification via split augmented Lagrangian (SISAL) aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The efficient implementation of SISAL method presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory.
Resumo:
We investigate a mechanism that generates exact solutions of scalar field cosmologies in a unified way. The procedure investigated here permits to recover almost all known solutions, and allows one to derive new solutions as well. In particular, we derive and discuss one novel solution defined in terms of the Lambert function. The solutions are organised in a classification which depends on the choice of a generating function which we have denoted by x(phi) that reflects the underlying thermodynamics of the model. We also analyse and discuss the existence of form-invariance dualities between solutions. A general way of defining the latter in an appropriate fashion for scalar fields is put forward.
Resumo:
This paper presents the Direct Power Control of Three-Phase Matrix Converters (DPC-MC) operating as Unified Power Flow Controllers (UPFC). Since matrix converters allow direct AC/AC power conversion without intermediate energy storage link, the resulting UPFC has reduced volume and cost, together with higher reliability. Theoretical principles of DPC-MC method are established based on an UPFC model, together with a new direct power control approach based on sliding mode control techniques. As a result, active and reactive power can be directly controlled by selection of an appropriate switching state of matrix converter. This new direct power control approach associated to matrix converters technology guarantees decoupled active and reactive power control, zero error tracking, fast response times and timely control actions. Simulation results show good performance of the proposed system.
Resumo:
A evolução tecnológica e das sociedades permitiu que, hoje em dia, uma boa parte da população tenha acesso a dispositivos móveis com funcionalidades avançadas. Com este tipo de dispositivos, temos acesso a inúmeras fontes de informação em tempo-real, mas esta característica ainda não é, hoje em dia, aproveitada na sua totalidade. Este projecto tenta tirar partido desta realidade para, utilizando os diversos dispositivos móveis, criar uma rede de troca de informações de trânsito. O utilizador apenas necessita de servir-se do seu dispositivo móvel para, automaticamente, obter as mais recentes informações de trânsito enquanto, paralelamente, partilha com os outros utilizadores a sua informação. Apesar de existirem outras alternativas no mercado, com soluções que permitem usufruir do mesmo tipo de funcionalidades, nenhuma utiliza este tipo de dispositivos (GPS’s convencionais, por exemplo). Um dos requisitos necessário na implementação deste projecto é uma solução de geocoding. Após terem sido testadas várias soluções, nenhuma cumpria, na totalidade, os requisitos deste projecto, o que originou o desenvolvimento de uma nova solução que cumpre esses requisitos. A solução é, toda ela, muito modular, formada por vários componentes, cada um com responsabilidades bem identificadas. A arquitectura desta solução baseia-se nos padrões de desenvolvimento de uma Service Oriented Architecture. Todos os componentes disponibilizam as suas operações através de web services, e a sua descoberta recorre ao protocolo WS-Discovery. Estes vários componentes podem ser divididos em duas categorias: os do núcleo, responsáveis por criar e oferecer as funcionalidades requisitadas neste projecto e os módulos externos, nos quais se incluem as aplicações que apresentam as funcionalidades ao utilizador. Foram criadas duas formas de consumir a informação oferecida pelo serviço SIAT: a aplicação móvel e um website. No âmbito dos dispositivos móveis, foi desenvolvida uma aplicação para o sistema operativo Windows Phone 7.
Resumo:
A two terminal optically addressed image processing device based on two stacked sensing/switching p-i-n a-SiC:H diodes is presented. The charge packets are injected optically into the p-i-n sensing photodiode and confined at the illuminated regions changing locally the electrical field profile across the p-i-n switching diode. A red scanner is used for charge readout. The various design parameters and addressing architecture trade-offs are discussed. The influence on the transfer functions of an a-SiC:H sensing absorber optimized for red transmittance and blue collection or of a floating anode in between is analysed. Results show that the thin a-SiC:H sensing absorber confines the readout to the switching diode and filters the light allowing full colour detection at two appropriated voltages. When the floating anode is used the spectral response broadens, allowing B&W image recognition with improved light-to-dark sensitivity. A physical model supports the image and colour recognition process.