25 resultados para Parallel computing
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
International Conference with Peer Review 2012 IEEE International Conference in Geoscience and Remote Sensing Symposium (IGARSS), 22-27 July 2012, Munich, Germany
Resumo:
This letter presents a new parallel method for hyperspectral unmixing composed by the efficient combination of two popular methods: vertex component analysis (VCA) and sparse unmixing by variable splitting and augmented Lagrangian (SUNSAL). First, VCA extracts the endmember signatures, and then, SUNSAL is used to estimate the abundance fractions. Both techniques are highly parallelizable, which significantly reduces the computing time. A design for the commodity graphics processing units of the two methods is presented and evaluated. Experimental results obtained for simulated and real hyperspectral data sets reveal speedups up to 100 times, which grants real-time response required by many remotely sensed hyperspectral applications.
Resumo:
Physical computing has spun a true global revolution in the way in which the digital interfaces with the real world. From bicycle jackets with turn signal lights to twitter-controlled christmas trees, the Do-it-Yourself (DiY) hardware movement has been driving endless innovations and stimulating an age of creative engineering. This ongoing (r)evolution has been led by popular electronics platforms such as the Arduino, the Lilypad, or the Raspberry Pi, however, these are not designed taking into account the specific requirements of biosignal acquisition. To date, the physiological computing community has been severely lacking a parallel to that found in the DiY electronics realm, especially in what concerns suitable hardware frameworks. In this paper, we build on previous work developed within our group, focusing on an all-in-one, low-cost, and modular biosignal acquisition hardware platform, that makes it quicker and easier to build biomedical devices. We describe the main design considerations, experimental evaluation and circuit characterization results, together with the results from a usability study performed with volunteers from multiple target user groups, namely health sciences and electrical, biomedical, and computer engineering. Copyright © 2014 SCITEPRESS - Science and Technology Publications. All rights reserved.
Resumo:
Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems. © 2014 Technical University of Munich (TUM).
Resumo:
Remote hyperspectral sensors collect large amounts of data per flight usually with low spatial resolution. It is known that the bandwidth connection between the satellite/airborne platform and the ground station is reduced, thus a compression onboard method is desirable to reduce the amount of data to be transmitted. This paper presents a parallel implementation of an compressive sensing method, called parallel hyperspectral coded aperture (P-HYCA), for graphics processing units (GPU) using the compute unified device architecture (CUDA). This method takes into account two main properties of hyperspectral dataset, namely the high correlation existing among the spectral bands and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. Experimental results conducted using synthetic and real hyperspectral datasets on two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN, reveal that the use of GPUs can provide real-time compressive sensing performance. The achieved speedup is up to 20 times when compared with the processing time of HYCA running on one core of the Intel i7-2600 CPU (3.4GHz), with 16 Gbyte memory.
Resumo:
The application of compressive sensing (CS) to hyperspectral images is an active area of research over the past few years, both in terms of the hardware and the signal processing algorithms. However, CS algorithms can be computationally very expensive due to the extremely large volumes of data collected by imaging spectrometers, a fact that compromises their use in applications under real-time constraints. This paper proposes four efficient implementations of hyperspectral coded aperture (HYCA) for CS, two of them termed P-HYCA and P-HYCA-FAST and two additional implementations for its constrained version (CHYCA), termed P-CHYCA and P-CHYCA-FAST on commodity graphics processing units (GPUs). HYCA algorithm exploits the high correlation existing among the spectral bands of the hyperspectral data sets and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. The proposed P-HYCA and P-CHYCA implementations have been developed using the compute unified device architecture (CUDA) and the cuFFT library. Moreover, this library has been replaced by a fast iterative method in the P-HYCA-FAST and P-CHYCA-FAST implementations that leads to very significant speedup factors in order to achieve real-time requirements. The proposed algorithms are evaluated not only in terms of reconstruction error for different compressions ratios but also in terms of computational performance using two different GPU architectures by NVIDIA: 1) GeForce GTX 590; and 2) GeForce GTX TITAN. Experiments are conducted using both simulated and real data revealing considerable acceleration factors and obtaining good results in the task of compressing remotely sensed hyperspectral data sets.
Resumo:
Parallel hyperspectral unmixing problem is considered in this paper. A semisupervised approach is developed under the linear mixture model, where the abundance's physical constraints are taken into account. The proposed approach relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. Since Libraries are potentially very large and hyperspectral datasets are of high dimensionality a parallel implementation in a pixel-by-pixel fashion is derived to properly exploits the graphics processing units (GPU) architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for real hyperspectral datasets reveal significant speedup factors, up to 164 times, with regards to optimized serial implementation.
Resumo:
In this paper, we develop a fast implementation of an hyperspectral coded aperture (HYCA) algorithm on different platforms using OpenCL, an open standard for parallel programing on heterogeneous systems, which includes a wide variety of devices, from dense multicore systems from major manufactures such as Intel or ARM to new accelerators such as graphics processing units (GPUs), field programmable gate arrays (FPGAs), the Intel Xeon Phi and other custom devices. Our proposed implementation of HYCA significantly reduces its computational cost. Our experiments have been conducted using simulated data and reveal considerable acceleration factors. This kind of implementations with the same descriptive language on different architectures are very important in order to really calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.
Resumo:
A previously developed model is used to numerically simulate real clinical cases of the surgical correction of scoliosis. This model consists of one-dimensional finite elements with spatial deformation in which (i) the column is represented by its axis; (ii) the vertebrae are assumed to be rigid; and (iii) the deformability of the column is concentrated in springs that connect the successive rigid elements. The metallic rods used for the surgical correction are modeled by beam elements with linear elastic behavior. To obtain the forces at the connections between the metallic rods and the vertebrae geometrically, non-linear finite element analyses are performed. The tightening sequence determines the magnitude of the forces applied to the patient column, and it is desirable to keep those forces as small as possible. In this study, a Genetic Algorithm optimization is applied to this model in order to determine the sequence that minimizes the corrective forces applied during the surgery. This amounts to find the optimal permutation of integers 1, ... , n, n being the number of vertebrae involved. As such, we are faced with a combinatorial optimization problem isomorph to the Traveling Salesman Problem. The fitness evaluation requires one computing intensive Finite Element Analysis per candidate solution and, thus, a parallel implementation of the Genetic Algorithm is developed.
Resumo:
A necessidade de poder computacional é crescente nas diversas áreas de actuação humana, tanto na indústria, como em ambientes académicos. Grid Computing permite a ligação de recursos computacionais dispersos de maneira a permitir a sua utilização mais eficaz, fornecendo aos utilizadores um acesso simplificado ao poder computacional de diversos sistemas. Os primeiros projectos de Grid Computing implicavam a ligação de máquinas paralelas ou aglomerados de alto desempenho e alto custo, disponíveis apenas em algumas instituições. Contrastando com o elevado custo dos super-computadores, os computadores pessoais e a Internet sofreram uma evolução significativa nos últimos anos. O uso de computadores dispersos em uma WAN pode representar um ambiente muito interessante para processamento de alto desempenho. Os sistemas em Grid fornecem a possibilidade de se utilizar um conjunto de computadores pessoais de modo a fornecer uma computação que utiliza recursos que de outra maneira estariam omissos. Este trabalho consiste no estudo de Grid Computing a nível de conceito e de arquitectura e numa análise ao seu estado actual hoje em dia. Como complemento foi desenvolvido um componente que permite o desenvolvimento de serviços para Grids (Grid Services) mais eficaz do que o modelo de suporte a serviços actualmente utilizado. Este componente é disponibilizado sob a forma um plug-in para a plataforma Eclipse IDE.
Resumo:
We investigate the phase behaviour of 2D mixtures of bi-functional and three-functional patchy particles and 3D mixtures of bi-functional and tetra-functional patchy particles by means of Monte Carlo simulations and Wertheim theory. We start by computing the critical points of the pure systems and then we investigate how the critical parameters change upon lowering the temperature. We extend the successive umbrella sampling method to mixtures to make it possible to extract information about the phase behaviour of the system at a fixed temperature for the whole range of densities and compositions of interest. (C) 2013 AIP Publishing LLC.
Resumo:
In this contribution, we investigate the low-temperature, low-density behaviour of dipolar hard-sphere (DHS) particles, i.e., hard spheres with dipoles embedded in their centre. We aim at describing the DHS fluid in terms of a network of chains and rings (the fundamental clusters) held together by branching points (defects) of different nature. We first introduce a systematic way of classifying inter-cluster connections according to their topology, and then employ this classification to analyse the geometric and thermodynamic properties of each class of defects, as extracted from state-of-the-art equilibrium Monte Carlo simulations. By computing the average density and energetic cost of each defect class, we find that the relevant contribution to inter-cluster interactions is indeed provided by (rare) three-way junctions and by four-way junctions arising from parallel or anti-parallel locally linear aggregates. All other (numerous) defects are either intra-cluster or associated to low cluster-cluster interaction energies, suggesting that these defects do not play a significant part in the thermodynamic description of the self-assembly processes of dipolar hard spheres. (C) 2013 AIP Publishing LLC.
Resumo:
Thesis submitted in the fulfilment of the requirements for the Degree of Master in Electronic and Telecomunications Engineering
Resumo:
Workflows have been successfully applied to express the decomposition of complex scientific applications. This has motivated many initiatives that have been developing scientific workflow tools. However the existing tools still lack adequate support to important aspects namely, decoupling the enactment engine from workflow tasks specification, decentralizing the control of workflow activities, and allowing their tasks to run autonomous in distributed infrastructures, for instance on Clouds. Furthermore many workflow tools only support the execution of Direct Acyclic Graphs (DAG) without the concept of iterations, where activities are executed millions of iterations during long periods of time and supporting dynamic workflow reconfigurations after certain iteration. We present the AWARD (Autonomic Workflow Activities Reconfigurable and Dynamic) model of computation, based on the Process Networks model, where the workflow activities (AWA) are autonomic processes with independent control that can run in parallel on distributed infrastructures, e. g. on Clouds. Each AWA executes a Task developed as a Java class that implements a generic interface allowing end-users to code their applications without concerns for low-level details. The data-driven coordination of AWA interactions is based on a shared tuple space that also enables support to dynamic workflow reconfiguration and monitoring of the execution of workflows. We describe how AWARD supports dynamic reconfiguration and discuss typical workflow reconfiguration scenarios. For evaluation we describe experimental results of AWARD workflow executions in several application scenarios, mapped to a small dedicated cluster and the Amazon (Elastic Computing EC2) Cloud.