963 resultados para GPU acceleration
Resumo:
This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included.
Resumo:
The laser driven ion acceleration is a burgeoning field of resarch and is attracting a growing number of scientists since the first results reported in 2000 obtained irradiating thin solid foils by high power laser pulses. The growing interest is driven by the peculiar characteristics of the produced bunches, the compactness of the whole accelerating system and the very short accelerating length of this all-optical accelerators. A fervent theoretical and experimental work has been done since then. An important part of the theoretical study is done by means of numerical simulations and the most widely used technique exploits PIC codes (“Particle In Cell'”). In this thesis the PIC code AlaDyn, developed by our research group considering innovative algorithms, is described. My work has been devoted to the developement of the code and the investigation of the laser driven ion acceleration for different target configurations. Two target configurations for the proton acceleration are presented together with the results of the 2D and 3D numerical investigation. One target configuration consists of a solid foil with a low density layer attached on the irradiated side. The nearly critical plasma of the foam layer allows a very high energy absorption by the target and an increase of the proton energy up to a factor 3, when compared to the ``pure'' TNSA configuration. The differences of the regime with respect to the standard TNSA are described The case of nearly critical density targets has been investigated with 3D simulations. In this case the laser travels throughout the plasma and exits on the rear side. During the propagation, the laser drills a channel and induce a magnetic vortex that expanding on the rear side of the targer is source of a very intense electric field. The protons of the plasma are strongly accelerated up to energies of 100 MeV using a 200PW laser.
Resumo:
The evolution of the electronics embedded applications forces electronics systems designers to match their ever increasing requirements. This evolution pushes the computational power of digital signal processing systems, as well as the energy required to accomplish the computations, due to the increasing mobility of such applications. Current approaches used to match these requirements relies on the adoption of application specific signal processors. Such kind of devices exploits powerful accelerators, which are able to match both performance and energy requirements. On the other hand, the too high specificity of such accelerators often results in a lack of flexibility which affects non-recurrent engineering costs, time to market, and market volumes too. The state of the art mainly proposes two solutions to overcome these issues with the ambition of delivering reasonable performance and energy efficiency: reconfigurable computing and multi-processors computing. All of these solutions benefits from the post-fabrication programmability, that definitively results in an increased flexibility. Nevertheless, the gap between these approaches and dedicated hardware is still too high for many application domains, especially when targeting the mobile world. In this scenario, flexible and energy efficient acceleration can be achieved by merging these two computational paradigms, in order to address all the above introduced constraints. This thesis focuses on the exploration of the design and application spectrum of reconfigurable computing, exploited as application specific accelerators for multi-processors systems on chip. More specifically, it introduces a reconfigurable digital signal processor featuring a heterogeneous set of reconfigurable engines, and a homogeneous multi-core system, exploiting three different flavours of reconfigurable and mask-programmable technologies as implementation platform for applications specific accelerators. In this work, the various trade-offs concerning the utilization multi-core platforms and the different configuration technologies are explored, characterizing the design space of the proposed approach in terms of programmability, performance, energy efficiency and manufacturing costs.
Resumo:
The efficient emulation of a many-core architecture is a challenging task, each core could be emulated through a dedicated thread and such threads would be interleaved on an either single-core or a multi-core processor. The high number of context switches will results in an unacceptable performance. To support this kind of application, the GPU computational power is exploited in order to schedule the emulation threads on the GPU cores. This presents a non trivial divergence issue, since GPU computational power is offered through SIMD processing elements, that are forced to synchronously execute the same instruction on different memory portions. Thus, a new emulation technique is introduced in order to overcome this limitation: instead of providing a routine for each ISA opcode, the emulator mimics the behavior of the Micro Architecture level, here instructions are date that a unique routine takes as input. Our new technique has been implemented and compared with the classic emulation approach, in order to investigate the chance of a hybrid solution.
Resumo:
In the race to obtain protons with higher energies, using more compact systems at the same time, laser-driven plasma accelerators are becoming an interesting possibility. But for now, only beams with extremely broad energy spectra and high divergence have been produced. The driving line of this PhD thesis was the study and design of a compact system to extract a high quality beam out of the initial bunch of protons produced by the interaction of a laser pulse with a thin solid target, using experimentally reliable technologies in order to be able to test such a system as soon as possible. In this thesis, different transport lines are analyzed. The first is based on a high field pulsed solenoid, some collimators and, for perfect filtering and post-acceleration, a high field high frequency compact linear accelerator, originally designed to accelerate a 30 MeV beam extracted from a cyclotron. The second one is based on a quadruplet of permanent magnetic quadrupoles: thanks to its greater simplicity and reliability, it has great interest for experiments, but the effectiveness is lower than the one based on the solenoid; in fact, the final beam intensity drops by an order of magnitude. An additional sensible decrease in intensity is verified in the third case, where the energy selection is achieved using a chicane, because of its very low efficiency for off-axis protons. The proposed schemes have all been analyzed with 3D simulations and all the significant results are presented. Future experimental work based on the outcome of this thesis can be planned and is being discussed now.
Resumo:
La radioterapia guidata da immagini (IGRT), grazie alle ripetute verifiche della posizione del paziente e della localizzazione del volume bersaglio, si è recentemente affermata come nuovo paradigma nella radioterapia, avendo migliorato radicalmente l’accuratezza nella somministrazione di dose a scopo terapeutico. Una promettente tecnica nel campo dell’IGRT è rappresentata dalla tomografia computerizzata a fascio conico (CBCT). La CBCT a kilovoltaggio, consente di fornire un’accurata mappatura tridimensionale dell’anatomia del paziente, in fase di pianificazione del trattamento e a ogni frazione del medisimo. Tuttavia, la dose da imaging attribuibile alle ripetute scansioni è diventata, negli ultimi anni, oggetto di una crescente preoccupazione nel contesto clinico. Lo scopo di questo lavoro è di valutare quantitativamente la dose addizionale somministrata da CBCT a kilovoltaggio, con riferimento a tre tipici protocolli di scansione per Varian OnBoard Imaging Systems (OBI, Palo Alto, California). A questo scopo sono state condotte simulazioni con codici Monte Carlo per il calcolo della dose, utilizzando il pacchetto gCTD, sviluppato sull’architettura della scheda grafica. L’utilizzo della GPU per sistemi server di calcolo ha permesso di raggiungere alte efficienze computazionali, accelerando le simulazioni Monte Carlo fino a raggiungere tempi di calcolo di ~1 min per un caso tipico. Inizialmente sono state condotte misure sperimentali di dose su un fantoccio d’acqua. I parametri necessari per la modellazione della sorgente di raggi X nel codice gCTD sono stati ottenuti attraverso un processo di validazione del codice al fine di accordare i valori di dose simulati in acqua con le misure nel fantoccio. Lo studio si concentra su cinquanta pazienti sottoposti a cicli di radioterapia a intensità modulata (IMRT). Venticinque pazienti con tumore al cervello sono utilizzati per studiare la dose nel protocollo standard-dose head e venticinque pazienti con tumore alla prostata sono selezionati per studiare la dose nei protocolli pelvis e pelvis spotlight. La dose media a ogni organo è calcolata. La dose media al 2% dei voxels con i valori più alti di dose è inoltre computata per ogni organo, al fine di caratterizzare l’omogeneità spaziale della distribuzione.
Resumo:
In accordo con la filosofia della Software Defined Radio è stato progettato un decoder LDPC software che utilizza una GPU per ottenere prestazioni migliori. Il lavoro, che comprende anche l'encoder e un simulatore di canale AWGN, può essere utilizzato sia per eseguire simulazioni che per elaborare dati in real time. Come caso di studio si sono considerati i codici LDPC dello standard DVB-S2.
Resumo:
The aim of this work is to present various aspects of numerical simulation of particle and radiation transport for industrial and environmental protection applications, to enable the analysis of complex physical processes in a fast, reliable, and efficient way. In the first part we deal with speed-up of numerical simulation of neutron transport for nuclear reactor core analysis. The convergence properties of the source iteration scheme of the Method of Characteristics applied to be heterogeneous structured geometries has been enhanced by means of Boundary Projection Acceleration, enabling the study of 2D and 3D geometries with transport theory without spatial homogenization. The computational performances have been verified with the C5G7 2D and 3D benchmarks, showing a sensible reduction of iterations and CPU time. The second part is devoted to the study of temperature-dependent elastic scattering of neutrons for heavy isotopes near to the thermal zone. A numerical computation of the Doppler convolution of the elastic scattering kernel based on the gas model is presented, for a general energy dependent cross section and scattering law in the center of mass system. The range of integration has been optimized employing a numerical cutoff, allowing a faster numerical evaluation of the convolution integral. Legendre moments of the transfer kernel are subsequently obtained by direct quadrature and a numerical analysis of the convergence is presented. In the third part we focus our attention to remote sensing applications of radiative transfer employed to investigate the Earth's cryosphere. The photon transport equation is applied to simulate reflectivity of glaciers varying the age of the layer of snow or ice, its thickness, the presence or not other underlying layers, the degree of dust included in the snow, creating a framework able to decipher spectral signals collected by orbiting detectors.
Resumo:
Theories and numerical modeling are fundamental tools for understanding, optimizing and designing present and future laser-plasma accelerators (LPAs). Laser evolution and plasma wave excitation in a LPA driven by a weakly relativistically intense, short-pulse laser propagating in a preformed parabolic plasma channel, is studied analytically in 3D including the effects of pulse steepening and energy depletion. At higher laser intensities, the process of electron self-injection in the nonlinear bubble wake regime is studied by means of fully self-consistent Particle-in-Cell simulations. Considering a non-evolving laser driver propagating with a prescribed velocity, the geometrical properties of the non-evolving bubble wake are studied. For a range of parameters of interest for laser plasma acceleration, The dependence of the threshold for self-injection in the non-evolving wake on laser intensity and wake velocity is characterized. Due to the nonlinear and complex nature of the Physics involved, computationally challenging numerical simulations are required to model laser-plasma accelerators operating at relativistic laser intensities. The numerical and computational optimizations, that combined in the codes INF&RNO and INF&RNO/quasi-static give the possibility to accurately model multi-GeV laser wakefield acceleration stages with present supercomputing architectures, are discussed. The PIC code jasmine, capable of efficiently running laser-plasma simulations on Graphics Processing Units (GPUs) clusters, is presented. GPUs deliver exceptional performance to PIC codes, but the core algorithms had to be redesigned for satisfying the constraints imposed by the intrinsic parallelism of the architecture. The simulation campaigns, run with the code jasmine for modeling the recent LPA experiments with the INFN-FLAME and CNR-ILIL laser systems, are also presented.
Resumo:
ROTEM® is considered a helpful point-of-care device to monitor blood coagulation in emergency situations. Centrally performed analysis is desirable but rapid transport of blood samples is an important prerequisite. The effect of acceleration forces on sample transport through a pneumatic tube system on ROTEM® should be tested at each institution to exclude a pre-analytical influence. The aims of the present work were: (i) to investigate the effect of pneumatic tube transport on ROTEM® parameters; (ii) to compare blood sample transport via pneumatic tube vs. manual transportation; and (iii) to determine the effect of acceleration forces on ROTEM® parameters.