18 resultados para HPC parallel computer architecture queues fault tolerance programmability ADAM
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
Hyperspectral imaging has become one of the main topics in remote sensing applications, which comprise hundreds of spectral bands at different (almost contiguous) wavelength channels over the same area generating large data volumes comprising several GBs per flight. This high spectral resolution can be used for object detection and for discriminate between different objects based on their spectral characteristics. One of the main problems involved in hyperspectral analysis is the presence of mixed pixels, which arise when the spacial resolution of the sensor is not able to separate spectrally distinct materials. Spectral unmixing is one of the most important task for hyperspectral data exploitation. However, the unmixing algorithms can be computationally very expensive, and even high power consuming, which compromises the use in applications under on-board constraints. In recent years, graphics processing units (GPUs) have evolved into highly parallel and programmable systems. Specifically, several hyperspectral imaging algorithms have shown to be able to benefit from this hardware taking advantage of the extremely high floating-point processing performance, compact size, huge memory bandwidth, and relatively low cost of these units, which make them appealing for onboard data processing. In this paper, we propose a parallel implementation of an augmented Lagragian based method for unsupervised hyperspectral linear unmixing on GPUs using CUDA. The method called simplex identification via split augmented Lagrangian (SISAL) aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The efficient implementation of SISAL method presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory.
Resumo:
O trabalho apresentado por este documento aborda os problemas que advêm da necessidade de integração de aplicações, desenvolvidas em diferentes instantes no tempo, por diferentes equipas de trabalho, que para enriquecer os processos de negócio necessitam de comunicar entre si. A integração das aplicações tem de ser feita de forma opaca para estas, sendo disponibilizada por uma peça de software genérica, robusta e sem custos para as equipas desenvolvimento, na altura da integração. Esta integração tem de permitir que as aplicações comuniquem utilizando os protocolos que desejarem. Este trabalho propõe um middleware orientado a mensagens como solução para o problema identificado. A solução apresentada por este trabalho disponibiliza a comunicação entre aplicações que utilizam diferentes protocolos, permite ainda o desacoplamento temporal, espacial e de sincronismo na comunicação das aplicações. A implementação da solução tem base num sistema publish/subscribe orientado ao conteúdo e tem de lidar com as maiores exigências computacionais que este tipo de sistema acarta, sendo que a utilização deste se justifica com o enriquecimento da semântica de subscrição de eventos. Esta implementação utiliza uma arquitectura semi-distribuída, com o objectivo de aumentar a escalabilidade do sistema. A utilização da arquitectura semi-distribuída implica que a implementação da solução tem de lidar com o encaminhamento de eventos e divulgação das subscrições, pelos vários servidores de eventos. A implementação da solução disponibiliza garantias de persistência, processamento transaccional e tolerância a falhas, assim como transformação de eventos entre os diversos protocolos. A extensibilidade da solução é conseguida à custa de um sistema de pluggins que permite a adição de suporte a novos protocolos de comunicação. Os protocolos suportados pela implementação final do trabalho são RestMS e TCP.
Resumo:
O presente trabalho teve como principal objectivo o desenvolvimento de um analisador de vibrações de dois canais baseado em computador, para a realização de diagnóstico no âmbito do controlo de condição de máquinas. Foi desenvolvida uma aplicação num computador comum, no software LabVIEW, que através de transdutores de aceleração do tipo MEMS conectados via USB, faz a recolha de dados de vibração e procede ao seu processamento e apresentação ao utilizador. As ferramentas utilizadas para o processamento de dados são ferramentas comuns encontradas em vários analisadores de vibrações disponíveis no mercado. Estas podem ser: gráficos de espectro de frequência, sinal no tempo, cascata ou valores de nível global de vibração, entre outras. Apesar do analisador desenvolvido não apresentar inovação nas ferramentas de análise adoptadas, este pretende ser distinguido pelo baixo custo, simplicidade e carácter didáctico. Este trabalho vem evidenciar as vantagens, desvantagens e potencialidades de um analisador desta natureza. São tiradas algumas conclusões quanto à sua capacidade de diagnóstico de avarias, capacidades como ferramenta didáctica, sensores utilizados e linguagem de programação escolhida. Como conclusões principais, o trabalho revela que os sensores escolhidos não são os indicados para efectuar o diagnóstico de avarias em ambiente industrial, contudo são ideais para tornar este analisador numa boa ferramenta didáctica e de treino.
Resumo:
Relatório do Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia Mecânica
Resumo:
In this paper, we develop a fast implementation of an hyperspectral coded aperture (HYCA) algorithm on different platforms using OpenCL, an open standard for parallel programing on heterogeneous systems, which includes a wide variety of devices, from dense multicore systems from major manufactures such as Intel or ARM to new accelerators such as graphics processing units (GPUs), field programmable gate arrays (FPGAs), the Intel Xeon Phi and other custom devices. Our proposed implementation of HYCA significantly reduces its computational cost. Our experiments have been conducted using simulated data and reveal considerable acceleration factors. This kind of implementations with the same descriptive language on different architectures are very important in order to really calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.
Resumo:
A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 x 4 and 2 x 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120x concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 x 7,680 @ 30 fps).
Resumo:
This paper proposes an efficient scalable Residue Number System (RNS) architecture supporting moduli sets with an arbitrary number of channels, allowing to achieve larger dynamic range and a higher level of parallelism. The proposed architecture allows the forward and reverse RNS conversion, by reusing the arithmetic channel units. The arithmetic operations supported at the channel level include addition, subtraction, and multiplication with accumulation capability. For the reverse conversion two algorithms are considered, one based on the Chinese Remainder Theorem and the other one on Mixed-Radix-Conversion, leading to implementations optimized for delay and required circuit area. With the proposed architecture a complete and compact RNS platform is achieved. Experimental results suggest gains of 17 % in the delay in the arithmetic operations, with an area reduction of 23 % regarding the RNS state of the art. When compared with a binary system the proposed architecture allows to perform the same computation 20 times faster alongside with only 10 % of the circuit area resources.
Resumo:
This paper proposes a multifunctional architecture to implement field-programmable gate array (FPGA) controllers for power converters and presents a prototype for a pulsed power generator based on a solid-state Marx topology. The massively parallel nature of reconfigurable hardware platforms provides very high processing power and fast response times allowing the implementation of many subsystems in the same device. The prototype includes the controller, a failure detection system, an interface with a safety/emergency subsystem, a graphical user interface, and a virtual oscilloscope to visualize the generated pulse waveforms, using a single FPGA. The proposed architecture employs a modular design that can be easily adapted to other power converter topologies.
Resumo:
With very few exceptions, M > 4 tectonic earthquakes in the Azores show normal fault solution and occur away from the islands. Exceptionally, the 1998 shock was pure strike-slip and occurred within the northern edge of the Pico-Faial Ridge. Fault plane solutions show two possible planes of rupture striking ENE-WSW (dextral) and NNW-SSE (sinistral). The former has not been recognised in the Azores, but is parallel to the transform direction related to the relative motion between the Eurasia and Nubia plates. Therefore, the main question we address in the present study is: do transform faults related to the Eurasia/Nubia plate boundary exist in the Azores? Knowing that the main source of strain is related to plate kinematics, we conclude that the sinistral strike-slip NNW-SSE fault plane solution is not consistent with either the fault dip (ca. 65, which is typical of a normal fault) or the ca. ENE-WSW direction of maximum extension; both are consistent with a normal fault, as observed in most major earthquakes on faults striking around NNW-SSE in the Azores. In contrast, the dextral strike-slip ENE-WSW fault plane solution is consistent with the transform direction related to the anticlockwise rotation of Nubia relative to Eurasia. Altogether, tectonic data, measured ground motion, observed destruction, and modelling are consistent with a dextral strike-slip source fault striking ENE-WSW. Furthermore, the bulk clockwise rotation measured by GPS is typical of bookshelf block rotations observed at the termination of such master strike-slip faults. Therefore, we suggest that the 1998 earthquake can be related to the WSW termination of a transform (ENE-WSW fault plane solution) associated with the Nubia-Eurasia diffuse plate boundary. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
Sparse matrix-vector multiplication (SMVM) is a fundamental operation in many scientific and engineering applications. In many cases sparse matrices have thousands of rows and columns where most of the entries are zero, while non-zero data is spread over the matrix. This sparsity of data locality reduces the effectiveness of data cache in general-purpose processors quite reducing their performance efficiency when compared to what is achieved with dense matrix multiplication. In this paper, we propose a parallel processing solution for SMVM in a many-core architecture. The architecture is tested with known benchmarks using a ZYNQ-7020 FPGA. The architecture is scalable in the number of core elements and limited only by the available memory bandwidth. It achieves performance efficiencies up to almost 70% and better performances than previous FPGA designs.
Resumo:
Remote hyperspectral sensors collect large amounts of data per flight usually with low spatial resolution. It is known that the bandwidth connection between the satellite/airborne platform and the ground station is reduced, thus a compression onboard method is desirable to reduce the amount of data to be transmitted. This paper presents a parallel implementation of an compressive sensing method, called parallel hyperspectral coded aperture (P-HYCA), for graphics processing units (GPU) using the compute unified device architecture (CUDA). This method takes into account two main properties of hyperspectral dataset, namely the high correlation existing among the spectral bands and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. Experimental results conducted using synthetic and real hyperspectral datasets on two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN, reveal that the use of GPUs can provide real-time compressive sensing performance. The achieved speedup is up to 20 times when compared with the processing time of HYCA running on one core of the Intel i7-2600 CPU (3.4GHz), with 16 Gbyte memory.
Resumo:
The application of compressive sensing (CS) to hyperspectral images is an active area of research over the past few years, both in terms of the hardware and the signal processing algorithms. However, CS algorithms can be computationally very expensive due to the extremely large volumes of data collected by imaging spectrometers, a fact that compromises their use in applications under real-time constraints. This paper proposes four efficient implementations of hyperspectral coded aperture (HYCA) for CS, two of them termed P-HYCA and P-HYCA-FAST and two additional implementations for its constrained version (CHYCA), termed P-CHYCA and P-CHYCA-FAST on commodity graphics processing units (GPUs). HYCA algorithm exploits the high correlation existing among the spectral bands of the hyperspectral data sets and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. The proposed P-HYCA and P-CHYCA implementations have been developed using the compute unified device architecture (CUDA) and the cuFFT library. Moreover, this library has been replaced by a fast iterative method in the P-HYCA-FAST and P-CHYCA-FAST implementations that leads to very significant speedup factors in order to achieve real-time requirements. The proposed algorithms are evaluated not only in terms of reconstruction error for different compressions ratios but also in terms of computational performance using two different GPU architectures by NVIDIA: 1) GeForce GTX 590; and 2) GeForce GTX TITAN. Experiments are conducted using both simulated and real data revealing considerable acceleration factors and obtaining good results in the task of compressing remotely sensed hyperspectral data sets.
Resumo:
One of the main problems of hyperspectral data analysis is the presence of mixed pixels due to the low spatial resolution of such images. Linear spectral unmixing aims at inferring pure spectral signatures and their fractions at each pixel of the scene. The huge data volumes acquired by hyperspectral sensors put stringent requirements on processing and unmixing methods. This letter proposes an efficient implementation of the method called simplex identification via split augmented Lagrangian (SISAL) which exploits the graphics processing unit (GPU) architecture at low level using Compute Unified Device Architecture. SISAL aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The proposed implementation is performed in a pixel-by-pixel fashion using coalesced accesses to memory and exploiting shared memory to store temporary data. Furthermore, the kernels have been optimized to minimize the threads divergence, therefore achieving high GPU occupancy. The experimental results obtained for the simulated and real hyperspectral data sets reveal speedups up to 49 times, which demonstrates that the GPU implementation can significantly accelerate the method's execution over big data sets while maintaining the methods accuracy.