181 resultados para GPU


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work focuses on the creation and applications of a dynamic simulation software in order to study the hard metal structure (WC-Co). The technological ground used to increase the GPU hardware capacity was Geforce 9600 GT along with the PhysX chip created to make games more realistic. The software simulates the three-dimensional carbide structure to the shape of a cubic box where tungsten carbide (WC) are modeled as triangular prisms and truncated triangular prisms. The program was proven effective regarding checking testes, ranging from calculations of parameter measures such as the capacity to increase the number of particles simulated dynamically. It was possible to make an investigation of both the mean parameters and distributions stereological parameters used to characterize the carbide structure through cutting plans. Grounded on the cutting plans concerning the analyzed structures, we have investigated the linear intercepts, the intercepts to the area, and the perimeter section of the intercepted grains as well as the binder phase to the structure by calculating the mean value and distribution of the free path. As literature shows almost consensually that the distribution of the linear intercepts is lognormal, this suggests that the grain distribution is also lognormal. Thus, a routine was developed regarding the program which made possible a more detailed research on this issue. We have observed that it is possible, under certain values for the parameters which define the shape and size of the Prismatic grain to find out the distribution to the linear intercepts that approach the lognormal shape. Regarding a number of developed simulations, we have observed that the distribution curves of the linear and area intercepts as well as the perimeter section are consistent with studies on static computer simulation to these parameters.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The visualization of three-dimensional(3D)images is increasigly being sed in the area of medicine, helping physicians diagnose desease. the advances achived in scaners esed for acquisition of these 3d exames, such as computerized tumography(CT) and Magnetic Resonance imaging (MRI), enable the generation of images with higher resolutions, thus, generating files with much larger sizes. Currently, the images of computationally expensive one, and demanding the use of a righ and computer for such task. The direct remote acess of these images thruogh the internet is not efficient also, since all images have to be trasferred to the user´s equipment before the 3D visualization process ca start. with these problems in mind, this work proposes and analyses a solution for the remote redering of 3D medical images, called Remote Rendering (RR3D). In RR3D, the whole hedering process is pefomed a server or a cluster of servers, with high computational power, and only the resulting image is tranferred to the client, still allowing the client to peform operations such as rotations, zoom, etc. the solution was developed using web services written in java and an architecture that uses the scientific visualization packcage paraview, the framework paraviewWeb and the PACS server DCM4CHEE.The solution was tested with two scenarios where the rendering process was performed by a sever with graphics hadwere (GPU) and by a server without GPUs. In the scenarios without GPUs, the soluction was executed in parallel with several number of cores (processing units)dedicated to it. In order to compare our solution to order medical visualization application, a third scenario was esed in the rendering process, was done locally. In all tree scenarios, the solution was tested for different network speeds. The solution solved satisfactorily the problem with the delay in the transfer of the DICOM files, while alowing the use of low and computers as client for visualizing the exams even, tablets and smart phones

Relevância:

10.00% 10.00%

Publicador:

Resumo:

SAFT techniques are based on the sequential activation, in emission and reception, of the array elements and the post-processing of all the received signals to compose the image. Thus, the image generation can be divided into two stages: (1) the excitation and acquisition stage, where the signals received by each element or group of elements are stored; and (2) the beamforming stage, where the signals are combined together to obtain the image pixels. The use of Graphics Processing Units (GPUs), which are programmable devices with a high level of parallelism, can accelerate the computations of the beamforming process, that usually includes different functions such as dynamic focusing, band-pass filtering, spatial filtering or envelope detection. This work shows that using GPU technology can accelerate, in more than one order of magnitude with respect to CPU implementations, the beamforming and post-processing algorithms in SAFT imaging. ©2009 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Huge image collections are becoming available lately. In this scenario, the use of Content-Based Image Retrieval (CBIR) systems has emerged as a promising approach to support image searches. The objective of CBIR systems is to retrieve the most similar images in a collection, given a query image, by taking into account image visual properties such as texture, color, and shape. In these systems, the effectiveness of the retrieval process depends heavily on the accuracy of ranking approaches. Recently, re-ranking approaches have been proposed to improve the effectiveness of CBIR systems by taking into account the relationships among images. The re-ranking approaches consider the relationships among all images in a given dataset. These approaches typically demands a huge amount of computational power, which hampers its use in practical situations. On the other hand, these methods can be massively parallelized. In this paper, we propose to speedup the computation of the RL-Sim algorithm, a recently proposed image re-ranking approach, by using the computational power of Graphics Processing Units (GPU). GPUs are emerging as relatively inexpensive parallel processors that are becoming available on a wide range of computer systems. We address the image re-ranking performance challenges by proposing a parallel solution designed to fit the computational model of GPUs. We conducted an experimental evaluation considering different implementations and devices. Experimental results demonstrate that significant performance gains can be obtained. Our approach achieves speedups of 7x from serial implementation considering the overall algorithm and up to 36x on its core steps.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Communities are present on physical, chemical and biological systems and their identification is fundamental for the comprehension of the behavior of these systems. Recently, available data related to complex networks have grown exponentially, demanding more computational power. The Graphical Processing Unit (GPU) is a cost effective alternative suitable for this purpose. We investigate the convenience of this for network science by proposing a GPU based implementation of Newman community detection algorithm. We showed that the processing time of matrix multiplications of GPUs grow slower than CPUs in relation to the matrix size. It was proven, thus, that GPU processing power is a viable solution for community dentification simulation that demand high computational power. Our implementation was tested on an integrated biological network for the bacterium Escherichia coli

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Identify opportunities for software parallelism is a task that takes a lot of human time, but once some code patterns for parallelism are identified, a software could quickly accomplish this task. Thus, automating this process brings many benefits such as saving time and reducing errors caused by the programmer [1]. This work aims at developing a software environment that identifies opportunities for parallelism in a source code written in C language, and generates a program with the same behavior, but with higher degree of parallelism, compatible with a graphics processor compatible with CUDA architecture.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

[ES] En este proyecto fin de carrera se trata de paralelizar un algoritmo de desenredo y suavizado de mallas de tetraedros en un computador gráfico de tipo GPU. Las mallas de tetraedros son elementos que se suelen usar mucho en simulaciones de sistemas físicos, los cuales necesitan elementos de calidad. Algunos generadores de mallas generan mallas válidas pero de poca calidad. Es por esto que se necesita un algoritmo que sea lo más rápido y eficiente posible para hacer posible este propósito. Con este fin, se intenta implementar dicho algoritmo para aprovechar al máximo los recursos que nos ofrecen los procesadores gráficos de tipo GPU.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Microprocessori basati su singolo processore (CPU), hanno visto una rapida crescita di performances ed un abbattimento dei costi per circa venti anni. Questi microprocessori hanno portato una potenza di calcolo nell’ordine del GFLOPS (Giga Floating Point Operation per Second) sui PC Desktop e centinaia di GFLOPS su clusters di server. Questa ascesa ha portato nuove funzionalità nei programmi, migliori interfacce utente e tanti altri vantaggi. Tuttavia questa crescita ha subito un brusco rallentamento nel 2003 a causa di consumi energetici sempre più elevati e problemi di dissipazione termica, che hanno impedito incrementi di frequenza di clock. I limiti fisici del silicio erano sempre più vicini. Per ovviare al problema i produttori di CPU (Central Processing Unit) hanno iniziato a progettare microprocessori multicore, scelta che ha avuto un impatto notevole sulla comunità degli sviluppatori, abituati a considerare il software come una serie di comandi sequenziali. Quindi i programmi che avevano sempre giovato di miglioramenti di prestazioni ad ogni nuova generazione di CPU, non hanno avuto incrementi di performance, in quanto essendo eseguiti su un solo core, non beneficiavano dell’intera potenza della CPU. Per sfruttare appieno la potenza delle nuove CPU la programmazione concorrente, precedentemente utilizzata solo su sistemi costosi o supercomputers, è diventata una pratica sempre più utilizzata dagli sviluppatori. Allo stesso tempo, l’industria videoludica ha conquistato una fetta di mercato notevole: solo nel 2013 verranno spesi quasi 100 miliardi di dollari fra hardware e software dedicati al gaming. Le software houses impegnate nello sviluppo di videogames, per rendere i loro titoli più accattivanti, puntano su motori grafici sempre più potenti e spesso scarsamente ottimizzati, rendendoli estremamente esosi in termini di performance. Per questo motivo i produttori di GPU (Graphic Processing Unit), specialmente nell’ultimo decennio, hanno dato vita ad una vera e propria rincorsa alle performances che li ha portati ad ottenere dei prodotti con capacità di calcolo vertiginose. Ma al contrario delle CPU che agli inizi del 2000 intrapresero la strada del multicore per continuare a favorire programmi sequenziali, le GPU sono diventate manycore, ovvero con centinaia e centinaia di piccoli cores che eseguono calcoli in parallelo. Questa immensa capacità di calcolo può essere utilizzata in altri campi applicativi? La risposta è si e l’obiettivo di questa tesi è proprio quello di constatare allo stato attuale, in che modo e con quale efficienza pùo un software generico, avvalersi dell’utilizzo della GPU invece della CPU.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ultrasound imaging is widely used in medical diagnostics as it is the fastest, least invasive, and least expensive imaging modality. However, ultrasound images are intrinsically difficult to be interpreted. In this scenario, Computer Aided Detection (CAD) systems can be used to support physicians during diagnosis providing them a second opinion. This thesis discusses efficient ultrasound processing techniques for computer aided medical diagnostics, focusing on two major topics: (i) Ultrasound Tissue Characterization (UTC), aimed at characterizing and differentiating between healthy and diseased tissue; (ii) Ultrasound Image Segmentation (UIS), aimed at detecting the boundaries of anatomical structures to automatically measure organ dimensions and compute clinically relevant functional indices. Research on UTC produced a CAD tool for Prostate Cancer detection to improve the biopsy protocol. In particular, this thesis contributes with: (i) the development of a robust classification system; (ii) the exploitation of parallel computing on GPU for real-time performance; (iii) the introduction of both an innovative Semi-Supervised Learning algorithm and a novel supervised/semi-supervised learning scheme for CAD system training that improve system performance reducing data collection effort and avoiding collected data wasting. The tool provides physicians a risk map highlighting suspect tissue areas, allowing them to perform a lesion-directed biopsy. Clinical validation demonstrated the system validity as a diagnostic support tool and its effectiveness at reducing the number of biopsy cores requested for an accurate diagnosis. For UIS the research developed a heart disease diagnostic tool based on Real-Time 3D Echocardiography. Thesis contributions to this application are: (i) the development of an automated GPU based level-set segmentation framework for 3D images; (ii) the application of this framework to the myocardium segmentation. Experimental results showed the high efficiency and flexibility of the proposed framework. Its effectiveness as a tool for quantitative analysis of 3D cardiac morphology and function was demonstrated through clinical validation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molucular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing in the potential energy of the system in order to force the system to sample a specific region in the configurational space. Several N independent simulations are performed in order to sample all the region of interest. Subsequently, the WHAM algorithm is used to estimate the original system energy starting from the N atomic trajectories. The parallelization of WHAM has been performed through CUDA, a language that allows to work in GPUs of NVIDIA graphic cards, which have a parallel achitecture. The parallel implementation may sensibly speed up the WHAM execution compared to previous serial CPU imlementations. However, the WHAM CPU code presents some temporal criticalities to very high numbers of interactions. The algorithm has been written in C++ and executed in UNIX systems provided with NVIDIA graphic cards. The results were satisfying obtaining an increase of performances when the model was executed on graphics cards with compute capability greater. Nonetheless, the GPUs used to test the algorithm is quite old and not designated for scientific calculations. It is likely that a further performance increase will be obtained if the algorithm would be executed in clusters of GPU at high level of computational efficiency. The thesis is organized in the following way: I will first describe the mathematical formulation of Umbrella Sampling and WHAM algorithm with their apllications in the study of ionic channels and in Molecular Docking (Chapter 1); then, I will present the CUDA architectures used to implement the model (Chapter 2); and finally, the results obtained on model systems will be presented (Chapter 3).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Die Entstehung eines Marktpreises für einen Vermögenswert kann als Superposition der einzelnen Aktionen der Marktteilnehmer aufgefasst werden, die damit kumulativ Angebot und Nachfrage erzeugen. Dies ist in der statistischen Physik mit der Entstehung makroskopischer Eigenschaften vergleichbar, die von mikroskopischen Wechselwirkungen zwischen den beteiligten Systemkomponenten hervorgerufen werden. Die Verteilung der Preisänderungen an Finanzmärkten unterscheidet sich deutlich von einer Gaußverteilung. Dies führt zu empirischen Besonderheiten des Preisprozesses, zu denen neben dem Skalierungsverhalten nicht-triviale Korrelationsfunktionen und zeitlich gehäufte Volatilität zählen. In der vorliegenden Arbeit liegt der Fokus auf der Analyse von Finanzmarktzeitreihen und den darin enthaltenen Korrelationen. Es wird ein neues Verfahren zur Quantifizierung von Muster-basierten komplexen Korrelationen einer Zeitreihe entwickelt. Mit dieser Methodik werden signifikante Anzeichen dafür gefunden, dass sich typische Verhaltensmuster von Finanzmarktteilnehmern auf kurzen Zeitskalen manifestieren, dass also die Reaktion auf einen gegebenen Preisverlauf nicht rein zufällig ist, sondern vielmehr ähnliche Preisverläufe auch ähnliche Reaktionen hervorrufen. Ausgehend von der Untersuchung der komplexen Korrelationen in Finanzmarktzeitreihen wird die Frage behandelt, welche Eigenschaften sich beim Wechsel von einem positiven Trend zu einem negativen Trend verändern. Eine empirische Quantifizierung mittels Reskalierung liefert das Resultat, dass unabhängig von der betrachteten Zeitskala neue Preisextrema mit einem Anstieg des Transaktionsvolumens und einer Reduktion der Zeitintervalle zwischen Transaktionen einhergehen. Diese Abhängigkeiten weisen Charakteristika auf, die man auch in anderen komplexen Systemen in der Natur und speziell in physikalischen Systemen vorfindet. Über 9 Größenordnungen in der Zeit sind diese Eigenschaften auch unabhängig vom analysierten Markt - Trends, die nur für Sekunden bestehen, zeigen die gleiche Charakteristik wie Trends auf Zeitskalen von Monaten. Dies eröffnet die Möglichkeit, mehr über Finanzmarktblasen und deren Zusammenbrüche zu lernen, da Trends auf kleinen Zeitskalen viel häufiger auftreten. Zusätzlich wird eine Monte Carlo-basierte Simulation des Finanzmarktes analysiert und erweitert, um die empirischen Eigenschaften zu reproduzieren und Einblicke in deren Ursachen zu erhalten, die zum einen in der Finanzmarktmikrostruktur und andererseits in der Risikoaversion der Handelsteilnehmer zu suchen sind. Für die rechenzeitintensiven Verfahren kann mittels Parallelisierung auf einer Graphikkartenarchitektur eine deutliche Rechenzeitreduktion erreicht werden. Um das weite Spektrum an Einsatzbereichen von Graphikkarten zu aufzuzeigen, wird auch ein Standardmodell der statistischen Physik - das Ising-Modell - auf die Graphikkarte mit signifikanten Laufzeitvorteilen portiert. Teilresultate der Arbeit sind publiziert in [PGPS07, PPS08, Pre11, PVPS09b, PVPS09a, PS09, PS10a, SBF+10, BVP10, Pre10, PS10b, PSS10, SBF+11, PB10].

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis investigates interactive scene reconstruction and understanding using RGB-D data only. Indeed, we believe that depth cameras will still be in the near future a cheap and low-power 3D sensing alternative suitable for mobile devices too. Therefore, our contributions build on top of state-of-the-art approaches to achieve advances in three main challenging scenarios, namely mobile mapping, large scale surface reconstruction and semantic modeling. First, we will describe an effective approach dealing with Simultaneous Localization And Mapping (SLAM) on platforms with limited resources, such as a tablet device. Unlike previous methods, dense reconstruction is achieved by reprojection of RGB-D frames, while local consistency is maintained by deploying relative bundle adjustment principles. We will show quantitative results comparing our technique to the state-of-the-art as well as detailed reconstruction of various environments ranging from rooms to small apartments. Then, we will address large scale surface modeling from depth maps exploiting parallel GPU computing. We will develop a real-time camera tracking method based on the popular KinectFusion system and an online surface alignment technique capable of counteracting drift errors and closing small loops. We will show very high quality meshes outperforming existing methods on publicly available datasets as well as on data recorded with our RGB-D camera even in complete darkness. Finally, we will move to our Semantic Bundle Adjustment framework to effectively combine object detection and SLAM in a unified system. Though the mathematical framework we will describe does not restrict to a particular sensing technology, in the experimental section we will refer, again, only to RGB-D sensing. We will discuss successful implementations of our algorithm showing the benefit of a joint object detection, camera tracking and environment mapping.