949 resultados para Parallel programming (computer)
Resumo:
Even though Software Transactional Memory (STM) is one of the most promising approaches to simplify concurrent programming, current STM implementations incur significant overheads that render them impractical for many real-sized programs. The key insight of this work is that we do not need to use the same costly barriers for all the memory managed by a real-sized application, if only a small fraction of the memory is under contention lightweight barriers may be used in this case. In this work, we propose a new solution based on an approach of adaptive object metadata (AOM) to promote the use of a fast path to access objects that are not under contention. We show that this approach is able to make the performance of an STM competitive with the best fine-grained lock-based approaches in some of the more challenging benchmarks. (C) 2015 Elsevier Inc. All rights reserved.
Resumo:
The application of mathematical methods and computer algorithms in the analysis of economic and financial data series aims to give empirical descriptions of the hidden relations between many complex or unknown variables and systems. This strategy overcomes the requirement for building models based on a set of fundamental laws, which is the paradigm for studying phenomena usual in physics and engineering. In spite of this shortcut, the fact is that financial series demonstrate to be hard to tackle, involving complex memory effects and a apparently chaotic behaviour. Several measures for describing these objects were adopted by market agents, but, due to their simplicity, they are not capable to cope with the diversity and complexity embedded in the data. Therefore, it is important to propose new measures that, on one hand, are highly interpretable by standard personal but, on the other hand, are capable of capturing a significant part of the dynamical effects.
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
Trabalho apresentado no mbito do Mestrado em Engenharia Informtica, como requisito parcial para obteno do grau de Mestre em Engenharia Informtica
Resumo:
Thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Electrical and Computer Engineering
Resumo:
Remote hyperspectral sensors collect large amounts of data per flight usually with low spatial resolution. It is known that the bandwidth connection between the satellite/airborne platform and the ground station is reduced, thus a compression onboard method is desirable to reduce the amount of data to be transmitted. This paper presents a parallel implementation of an compressive sensing method, called parallel hyperspectral coded aperture (P-HYCA), for graphics processing units (GPU) using the compute unified device architecture (CUDA). This method takes into account two main properties of hyperspectral dataset, namely the high correlation existing among the spectral bands and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. Experimental results conducted using synthetic and real hyperspectral datasets on two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN, reveal that the use of GPUs can provide real-time compressive sensing performance. The achieved speedup is up to 20 times when compared with the processing time of HYCA running on one core of the Intel i7-2600 CPU (3.4GHz), with 16 Gbyte memory.
Resumo:
The application of compressive sensing (CS) to hyperspectral images is an active area of research over the past few years, both in terms of the hardware and the signal processing algorithms. However, CS algorithms can be computationally very expensive due to the extremely large volumes of data collected by imaging spectrometers, a fact that compromises their use in applications under real-time constraints. This paper proposes four efficient implementations of hyperspectral coded aperture (HYCA) for CS, two of them termed P-HYCA and P-HYCA-FAST and two additional implementations for its constrained version (CHYCA), termed P-CHYCA and P-CHYCA-FAST on commodity graphics processing units (GPUs). HYCA algorithm exploits the high correlation existing among the spectral bands of the hyperspectral data sets and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. The proposed P-HYCA and P-CHYCA implementations have been developed using the compute unified device architecture (CUDA) and the cuFFT library. Moreover, this library has been replaced by a fast iterative method in the P-HYCA-FAST and P-CHYCA-FAST implementations that leads to very significant speedup factors in order to achieve real-time requirements. The proposed algorithms are evaluated not only in terms of reconstruction error for different compressions ratios but also in terms of computational performance using two different GPU architectures by NVIDIA: 1) GeForce GTX 590; and 2) GeForce GTX TITAN. Experiments are conducted using both simulated and real data revealing considerable acceleration factors and obtaining good results in the task of compressing remotely sensed hyperspectral data sets.
Resumo:
Mestrado em Engenharia Electrotcnica e de Computadores - rea de Especializao em Automao e Sistemas
Resumo:
One of the main problems of hyperspectral data analysis is the presence of mixed pixels due to the low spatial resolution of such images. Linear spectral unmixing aims at inferring pure spectral signatures and their fractions at each pixel of the scene. The huge data volumes acquired by hyperspectral sensors put stringent requirements on processing and unmixing methods. This letter proposes an efficient implementation of the method called simplex identification via split augmented Lagrangian (SISAL) which exploits the graphics processing unit (GPU) architecture at low level using Compute Unified Device Architecture. SISAL aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The proposed implementation is performed in a pixel-by-pixel fashion using coalesced accesses to memory and exploiting shared memory to store temporary data. Furthermore, the kernels have been optimized to minimize the threads divergence, therefore achieving high GPU occupancy. The experimental results obtained for the simulated and real hyperspectral data sets reveal speedups up to 49 times, which demonstrates that the GPU implementation can significantly accelerate the method's execution over big data sets while maintaining the methods accuracy.
Resumo:
Parallel hyperspectral unmixing problem is considered in this paper. A semisupervised approach is developed under the linear mixture model, where the abundance's physical constraints are taken into account. The proposed approach relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. Since Libraries are potentially very large and hyperspectral datasets are of high dimensionality a parallel implementation in a pixel-by-pixel fashion is derived to properly exploits the graphics processing units (GPU) architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for real hyperspectral datasets reveal significant speedup factors, up to 164 times, with regards to optimized serial implementation.
Resumo:
Many Hyperspectral imagery applications require a response in real time or near-real time. To meet this requirement this paper proposes a parallel unmixing method developed for graphics processing units (GPU). This method is based on the vertex component analysis (VCA), which is a geometrical based method highly parallelizable. VCA is a very fast and accurate method that extracts endmember signatures from large hyperspectral datasets without the use of any a priori knowledge about the constituent spectra. Experimental results obtained for simulated and real hyperspectral datasets reveal considerable acceleration factors, up to 24 times.
Resumo:
In this paper, a new parallel method for sparse spectral unmixing of remotely sensed hyperspectral data on commodity graphics processing units (GPUs) is presented. A semi-supervised approach is adopted, which relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. This method is based on the spectral unmixing by splitting and augmented Lagrangian (SUNSAL) that estimates the material's abundance fractions. The parallel method is performed in a pixel-by-pixel fashion and its implementation properly exploits the GPU architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for simulated and real hyperspectral datasets reveal significant speedup factors, up to 1 64 times, with regards to optimized serial implementation.
Resumo:
Dissertao apresentada para obteno do Grau de Doutor em Engenharia Informtica, pela Universidade Nova de Lisboa, Faculdade de Cincias e Tecnologia
Resumo:
Nos ltimos anos tem-se verificado um acentuado aumento na utilizao de dispositivos moveis a nvel internacional, pelo que as aplicaes desenvolvidas para este tipo especfico de dispositivos, conhecidas por apps, tem vindo a ganhar uma enorme popularidade. So cada vez mais as empresas que procuram estar presentes nos mais diversos sistemas operativos mveis, com o objectivo de suportar e desenvolver o seu negcio, alargando o seu leque de possveis consumidores. Neste sentido surgiram diversas ferramentas com a funo de facilitar o desenvolvimento de aplicaes mveis, denominadas frameworks multi-plataforma. Estas frameworks conduziram ao aparecimento de plataformas web, que permitem criar aplicaes multi-plataforma sem ser obrigatrio ter conhecimentos em programao. Assim, e a partir da anlise de vrios criadores online de aplicaes mveis identificados e das diferentes estratgias de desenvolvimento de aplicaes mveis existentes, foi proposta a implementao de uma plataforma web capaz de criar aplicaes nativas Android e iOS, dois dos sistemas operativos mais utilizados na actualidade. Apos desenvolvida a plataforma web, designada MobileAppBuilder, foi avaliada a sua Qualidade e as aplicaes criadas pela mesma, atravs do preenchimento de um questionrio por parte de 10 indivduos com formao em Engenharia Informtica, resultando numa classificao geral de excelente. De modo a analisar o desempenho das aplicaes produzidas pela plataforma desenvolvida, foram realizados testes comparativos entre uma aplicao da MobileAppBuilder e duas homologas de dois dos criadores online estudados, nomeadamente Andromo e Como. Os resultados destes testes revelaram que a MobileAppBuilder gera aplicaes menos pesadas, mais rpidas e mais eficientes em alguns aspetos, nomeadamente no arranque.
Resumo:
We derived a framework in integer programming, based on the properties of a linear ordering of the vertices in interval graphs, that acts as an edge completion model for obtaining interval graphs. This model can be applied to problems of sequencing cutting patterns, namely the minimization of open stacks problem (MOSP). By making small modifications in the objective function and using only some of the inequalities, the MOSP model is applied to another pattern sequencing problem that aims to minimize, not only the number of stacks, but also the order spread (the minimization of the stack occupation problem), and the model is tested.