993 resultados para Graphical processing units
Resumo:
The application of compressive sensing (CS) to hyperspectral images is an active area of research over the past few years, both in terms of the hardware and the signal processing algorithms. However, CS algorithms can be computationally very expensive due to the extremely large volumes of data collected by imaging spectrometers, a fact that compromises their use in applications under real-time constraints. This paper proposes four efficient implementations of hyperspectral coded aperture (HYCA) for CS, two of them termed P-HYCA and P-HYCA-FAST and two additional implementations for its constrained version (CHYCA), termed P-CHYCA and P-CHYCA-FAST on commodity graphics processing units (GPUs). HYCA algorithm exploits the high correlation existing among the spectral bands of the hyperspectral data sets and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. The proposed P-HYCA and P-CHYCA implementations have been developed using the compute unified device architecture (CUDA) and the cuFFT library. Moreover, this library has been replaced by a fast iterative method in the P-HYCA-FAST and P-CHYCA-FAST implementations that leads to very significant speedup factors in order to achieve real-time requirements. The proposed algorithms are evaluated not only in terms of reconstruction error for different compressions ratios but also in terms of computational performance using two different GPU architectures by NVIDIA: 1) GeForce GTX 590; and 2) GeForce GTX TITAN. Experiments are conducted using both simulated and real data revealing considerable acceleration factors and obtaining good results in the task of compressing remotely sensed hyperspectral data sets.
Resumo:
Endmember extraction (EE) is a fundamental and crucial task in hyperspectral unmixing. Among other methods vertex component analysis ( VCA) has become a very popular and useful tool to unmix hyperspectral data. VCA is a geometrical based method that extracts endmember signatures from large hyperspectral datasets without the use of any a priori knowledge about the constituent spectra. Many Hyperspectral imagery applications require a response in real time or near-real time. Thus, to met this requirement this paper proposes a parallel implementation of VCA developed for graphics processing units. The impact on the complexity and on the accuracy of the proposed parallel implementation of VCA is examined using both simulated and real hyperspectral datasets.
Resumo:
One of the main problems of hyperspectral data analysis is the presence of mixed pixels due to the low spatial resolution of such images. Linear spectral unmixing aims at inferring pure spectral signatures and their fractions at each pixel of the scene. The huge data volumes acquired by hyperspectral sensors put stringent requirements on processing and unmixing methods. This letter proposes an efficient implementation of the method called simplex identification via split augmented Lagrangian (SISAL) which exploits the graphics processing unit (GPU) architecture at low level using Compute Unified Device Architecture. SISAL aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The proposed implementation is performed in a pixel-by-pixel fashion using coalesced accesses to memory and exploiting shared memory to store temporary data. Furthermore, the kernels have been optimized to minimize the threads divergence, therefore achieving high GPU occupancy. The experimental results obtained for the simulated and real hyperspectral data sets reveal speedups up to 49 times, which demonstrates that the GPU implementation can significantly accelerate the method's execution over big data sets while maintaining the methods accuracy.
Resumo:
Parallel hyperspectral unmixing problem is considered in this paper. A semisupervised approach is developed under the linear mixture model, where the abundance's physical constraints are taken into account. The proposed approach relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. Since Libraries are potentially very large and hyperspectral datasets are of high dimensionality a parallel implementation in a pixel-by-pixel fashion is derived to properly exploits the graphics processing units (GPU) architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for real hyperspectral datasets reveal significant speedup factors, up to 164 times, with regards to optimized serial implementation.
Resumo:
Many Hyperspectral imagery applications require a response in real time or near-real time. To meet this requirement this paper proposes a parallel unmixing method developed for graphics processing units (GPU). This method is based on the vertex component analysis (VCA), which is a geometrical based method highly parallelizable. VCA is a very fast and accurate method that extracts endmember signatures from large hyperspectral datasets without the use of any a priori knowledge about the constituent spectra. Experimental results obtained for simulated and real hyperspectral datasets reveal considerable acceleration factors, up to 24 times.
Resumo:
In this paper, a new parallel method for sparse spectral unmixing of remotely sensed hyperspectral data on commodity graphics processing units (GPUs) is presented. A semi-supervised approach is adopted, which relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. This method is based on the spectral unmixing by splitting and augmented Lagrangian (SUNSAL) that estimates the material's abundance fractions. The parallel method is performed in a pixel-by-pixel fashion and its implementation properly exploits the GPU architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for simulated and real hyperspectral datasets reveal significant speedup factors, up to 1 64 times, with regards to optimized serial implementation.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Breast cancer is the most common cancer among women, being a major public health problem. Worldwide, X-ray mammography is the current gold-standard for medical imaging of breast cancer. However, it has associated some well-known limitations. The false-negative rates, up to 66% in symptomatic women, and the false-positive rates, up to 60%, are a continued source of concern and debate. These drawbacks prompt the development of other imaging techniques for breast cancer detection, in which Digital Breast Tomosynthesis (DBT) is included. DBT is a 3D radiographic technique that reduces the obscuring effect of tissue overlap and appears to address both issues of false-negative and false-positive rates. The 3D images in DBT are only achieved through image reconstruction methods. These methods play an important role in a clinical setting since there is a need to implement a reconstruction process that is both accurate and fast. This dissertation deals with the optimization of iterative algorithms, with parallel computing through an implementation on Graphics Processing Units (GPUs) to make the 3D reconstruction faster using Compute Unified Device Architecture (CUDA). Iterative algorithms have shown to produce the highest quality DBT images, but since they are computationally intensive, their clinical use is currently rejected. These algorithms have the potential to reduce patient dose in DBT scans. A method of integrating CUDA in Interactive Data Language (IDL) is proposed in order to accelerate the DBT image reconstructions. This method has never been attempted before for DBT. In this work the system matrix calculation, the most computationally expensive part of iterative algorithms, is accelerated. A speedup of 1.6 is achieved proving the fact that GPUs can accelerate the IDL implementation.
Resumo:
The Intel R Xeon PhiTM is the first processor based on Intel’s MIC (Many Integrated Cores) architecture. It is a co-processor specially tailored for data-parallel computations, whose basic architectural design is similar to the ones of GPUs (Graphics Processing Units), leveraging the use of many integrated low computational cores to perform parallel computations. The main novelty of the MIC architecture, relatively to GPUs, is its compatibility with the Intel x86 architecture. This enables the use of many of the tools commonly available for the parallel programming of x86-based architectures, which may lead to a smaller learning curve. However, programming the Xeon Phi still entails aspects intrinsic to accelerator-based computing, in general, and to the MIC architecture, in particular. In this thesis we advocate the use of algorithmic skeletons for programming the Xeon Phi. Algorithmic skeletons abstract the complexity inherent to parallel programming, hiding details such as resource management, parallel decomposition, inter-execution flow communication, thus removing these concerns from the programmer’s mind. In this context, the goal of the thesis is to lay the foundations for the development of a simple but powerful and efficient skeleton framework for the programming of the Xeon Phi processor. For this purpose we build upon Marrow, an existing framework for the orchestration of OpenCLTM computations in multi-GPU and CPU environments. We extend Marrow to execute both OpenCL and C++ parallel computations on the Xeon Phi. We evaluate the newly developed framework, several well-known benchmarks, like Saxpy and N-Body, will be used to compare, not only its performance to the existing framework when executing on the co-processor, but also to assess the performance on the Xeon Phi versus a multi-GPU environment.
Resumo:
In the present work the benefits of using graphics processing units (GPU) to aid the design of complex geometry profile extrusion dies, are studied. For that purpose, a3Dfinite volume based code that employs unstructured meshes to solve and couple the continuity, momentum and energy conservation equations governing the fluid flow, together with aconstitutive equation, was used. To evaluate the possibility of reducing the calculation time spent on the numerical calculations, the numerical code was parallelized in the GPU, using asimple programing approach without complex memory manipulations. For verificationpurposes, simulations were performed for three benchmark problems: Poiseuille flow, lid-driven cavity flow and flow around acylinder. Subsequently, the code was used on the design of two real life extrusion dies for the production of a medical catheter and a wood plastic composite decking profile. To evaluate the benefits, the results obtained with the GPU parallelized code were compared, in terms of speedup, with a serial implementation of the same code, that traditionally runs on the central processing unit (CPU). The results obtained show that, even with the simple parallelization approach employed, it was possible to obtain a significant reduction of the computation times.
Resumo:
Pig meat production was valued at €290 (£198) million at farm gate in Republic of Ireland (ROI) in 2007. In Northern Ireland (NI) in 2006, pig meat was estimated to account for almost seven percent of gross turnover in the food and drinks processing sector at £190 (€280) million. Whilst researching for this report it emerged that comparable figures for the value of the pig meat industry on ROI and NI are not available. This report showed that pig production on the IOI has changed from a small-scale enterprise carried out by a large number of mixed farmers to a modern industry comprised of a small number of specialist producers operating large-scale units. Most products for retailers are prepared and packed in specialised cutting and processing units which may or may not be integrated in the slaughter plant. For some pork products, various additives such as salt, herbs and flavour enhancers are added. Pork products are then stored and transported, frozen or chilled to wholesale, retail and catering facilities for ultimate sale to consumers.
Resumo:
A graphical processing unit (GPU) is a hardware device normally used to manipulate computer memory for the display of images. GPU computing is the practice of using a GPU device for scientific or general purpose computations that are not necessarily related to the display of images. Many problems in econometrics have a structure that allows for successful use of GPU computing. We explore two examples. The first is simple: repeated evaluation of a likelihood function at different parameter values. The second is a more complicated estimator that involves simulation and nonparametric fitting. We find speedups from 1.5 up to 55.4 times, compared to computations done on a single CPU core. These speedups can be obtained with very little expense, energy consumption, and time dedicated to system maintenance, compared to equivalent performance solutions using CPUs. Code for the examples is provided.
Resumo:
Purpose: The objective of this study is to investigate the feasibility of detecting and quantifying 3D cerebrovascular wall motion from a single 3D rotational x-ray angiography (3DRA) acquisition within a clinically acceptable time and computing from the estimated motion field for the further biomechanical modeling of the cerebrovascular wall. Methods: The whole motion cycle of the cerebral vasculature is modeled using a 4D B-spline transformation, which is estimated from a 4D to 2D + t image registration framework. The registration is performed by optimizing a single similarity metric between the entire 2D + t measured projection sequence and the corresponding forward projections of the deformed volume at their exact time instants. The joint use of two acceleration strategies, together with their implementation on graphics processing units, is also proposed so as to reach computation times close to clinical requirements. For further characterizing vessel wall properties, an approximation of the wall thickness changes is obtained through a strain calculation. Results: Evaluation on in silico and in vitro pulsating phantom aneurysms demonstrated an accurate estimation of wall motion curves. In general, the error was below 10% of the maximum pulsation, even in the situation when substantial inhomogeneous intensity pattern was present. Experiments on in vivo data provided realistic aneurysm and vessel wall motion estimates, whereas in regions where motion was neither visible nor anatomically possible, no motion was detected. The use of the acceleration strategies enabled completing the estimation process for one entire cycle in 5-10 min without degrading the overall performance. The strain map extracted from our motion estimation provided a realistic deformation measure of the vessel wall. Conclusions: The authors' technique has demonstrated that it can provide accurate and robust 4D estimates of cerebrovascular wall motion within a clinically acceptable time, although it has to be applied to a larger patient population prior to possible wide application to routine endovascular procedures. In particular, for the first time, this feasibility study has shown that in vivo cerebrovascular motion can be obtained intraprocedurally from a 3DRA acquisition. Results have also shown the potential of performing strain analysis using this imaging modality, thus making possible for the future modeling of biomechanical properties of the vascular wall.