26 resultados para parallel implementation
Resumo:
This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency. © 2014 Technical University of Munich (TUM).
Resumo:
This letter presents a new parallel method for hyperspectral unmixing composed by the efficient combination of two popular methods: vertex component analysis (VCA) and sparse unmixing by variable splitting and augmented Lagrangian (SUNSAL). First, VCA extracts the endmember signatures, and then, SUNSAL is used to estimate the abundance fractions. Both techniques are highly parallelizable, which significantly reduces the computing time. A design for the commodity graphics processing units of the two methods is presented and evaluated. Experimental results obtained for simulated and real hyperspectral data sets reveal speedups up to 100 times, which grants real-time response required by many remotely sensed hyperspectral applications.
Resumo:
This paper proposes a multifunctional architecture to implement field-programmable gate array (FPGA) controllers for power converters and presents a prototype for a pulsed power generator based on a solid-state Marx topology. The massively parallel nature of reconfigurable hardware platforms provides very high processing power and fast response times allowing the implementation of many subsystems in the same device. The prototype includes the controller, a failure detection system, an interface with a safety/emergency subsystem, a graphical user interface, and a virtual oscilloscope to visualize the generated pulse waveforms, using a single FPGA. The proposed architecture employs a modular design that can be easily adapted to other power converter topologies.
Resumo:
This paper presents the implementation of the OFDM demodulator and the Viterbi decoder, proposed as part of a wireless High Definition video receiver to be integrated in an FPGA. These blocks were implemented in a Xilinx Virtex-6 FPGA. The complete system was previously modeled and simulated using MATLAB/Simulink to extract importante hardware characteristics for the FPGA implementation.
Resumo:
Brain dopamine transporters imaging by Single Emission Tomography (SPECT) with 123I-FP-CIT (DaTScanTM) has become an important tool in the diagnosis and evaluation of Parkinson syndromes.This diagnostic method allows the visualization of a portion of the striatum – where healthy pattern resemble two symmetric commas - allowing the evaluation of dopamine presynaptic system, in which dopamine transporters are responsible for dopamine release into the synaptic cleft, and their reabsorption into the nigrostriatal nerve terminals, in order to be stored or degraded. In daily practice for assessment of DaTScan TM, it is common to rely only on visual assessment for diagnosis. However, this process is complex and subjective as it depends on the observer’s experience and it is associated with high variability intra and inter observer. Studies have shown that semiquantification can improve the diagnosis of Parkinson syndromes. For semiquantification, analysis methods of image segmentation using regions of interest (ROI) are necessary. ROIs are drawn, in specific - striatum - and in nonspecific – background – uptake areas. Subsequently, specific binding ratios are calculated. Low adherence of semiquantification for diagnosis of Parkinson syndromes is related, not only with the associated time spent, but also with the need of an adapted database of reference values for the population concerned, as well as, the examination of each service protocol. Studies have concluded, that this process increases the reproducibility of semiquantification. The aim of this investigation was to create and validate a database of healthy controls for Dopamine transporters with DaTScanTM named DBRV. The created database has been adapted to the Nuclear Medicine Department’s protocol, and the population of Infanta Cristina’s Hospital located in Badajoz, Spain.
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e telecomunicações
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia Eletrotécnica Ramo de Automação e Eletrónica Industrial
Resumo:
This paper presents the design and implementation of direct power controllers for three-phase matrix converters (MC) operating as Unified Power Flow Controllers (UPFC). Theoretical principles of the decoupled linear power controllers of the MC-UPFC to minimize the cross-coupling between active and reactive power control are established. From the matrix converter based UPFC model with a modified Venturini high frequency PWM modulator, decoupled controllers for the transmission line active (P) and reactive (Q) power direct control are synthesized. Simulation results, obtained from Matlab/Simulink, are presented in order to confirm the proposed approach. Results obtained show decoupled power control, zero error tracking, and fast responses with no overshoot and no steady-state error.
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
The application of compressive sensing (CS) to hyperspectral images is an active area of research over the past few years, both in terms of the hardware and the signal processing algorithms. However, CS algorithms can be computationally very expensive due to the extremely large volumes of data collected by imaging spectrometers, a fact that compromises their use in applications under real-time constraints. This paper proposes four efficient implementations of hyperspectral coded aperture (HYCA) for CS, two of them termed P-HYCA and P-HYCA-FAST and two additional implementations for its constrained version (CHYCA), termed P-CHYCA and P-CHYCA-FAST on commodity graphics processing units (GPUs). HYCA algorithm exploits the high correlation existing among the spectral bands of the hyperspectral data sets and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. The proposed P-HYCA and P-CHYCA implementations have been developed using the compute unified device architecture (CUDA) and the cuFFT library. Moreover, this library has been replaced by a fast iterative method in the P-HYCA-FAST and P-CHYCA-FAST implementations that leads to very significant speedup factors in order to achieve real-time requirements. The proposed algorithms are evaluated not only in terms of reconstruction error for different compressions ratios but also in terms of computational performance using two different GPU architectures by NVIDIA: 1) GeForce GTX 590; and 2) GeForce GTX TITAN. Experiments are conducted using both simulated and real data revealing considerable acceleration factors and obtaining good results in the task of compressing remotely sensed hyperspectral data sets.
Resumo:
Many Hyperspectral imagery applications require a response in real time or near-real time. To meet this requirement this paper proposes a parallel unmixing method developed for graphics processing units (GPU). This method is based on the vertex component analysis (VCA), which is a geometrical based method highly parallelizable. VCA is a very fast and accurate method that extracts endmember signatures from large hyperspectral datasets without the use of any a priori knowledge about the constituent spectra. Experimental results obtained for simulated and real hyperspectral datasets reveal considerable acceleration factors, up to 24 times.