106 resultados para NVIDIA CUDA
Resumo:
Análisis de desarrollo paralelo CUDA en lenguajes Java y Python, utilizando JCuda, RootBeer, PyCuda y Anaconda Accelerate.
Resumo:
X-ray computed log tomography has always been applied for qualitative reconstructions. In most cases, a series of consecutive slices of the timber are scanned to estimate the 3D image reconstruction of the entire log. However, the unexpected movement of the timber under study influences the quality of image reconstruction since the position and orientation of some scanned slices can be incorrectly estimated. In addition, the reconstruction time remains a significant challenge for practical applications. The present study investigates the possibility to employ modern physics engines for the problem of estimating the position of a moving rigid body and its scanned slices which are subject to X-ray computed tomography. The current work includes implementations of the extended Kalman filter and an algebraic reconstruction method for fan-bean computer tomography. In addition, modern techniques such as NVidia PhysX and CUDA are used in current study. As the result, it is numerically shown that it is possible to apply the extended Kalman filter together with a real-time physics engine, known as PhysX, in order to determine the position of a moving object. It is shown that the position of the rigid body can be determined based only on reconstructions of its slices. However, the simulation of the body movement sometimes is subject to an error during Kalman filter employment as PhysX is not always able to continue simulating the movement properly because of incorrect state estimation.
Resumo:
In this article we explore the NVIDIA graphical processing units (GPU) computational power in cryptography using CUDA (Compute Unified Device Architecture) technology. CUDA makes the general purpose computing easy using the parallel processing presents in GPUs. To do this, the NVIDIA GPUs architectures and CUDA are presented, besides cryptography concepts. Furthermore, we do the comparison between the versions executed in CPU with the parallel version of the cryptography algorithms Advanced Encryption Standard (AES) and Message-digest Algorithm 5 (MD5) wrote in CUDA. © 2011 AISTI.
Resumo:
Techniques of image combination, with extraction of objects to set a final scene, are very used in applications from photos montages to cinematographic productions. These techniques are called digital matting. With them is possible to decrease the cost of productions, because it is not necessary for the actor to be filmed in the location where the final scene occurs. This feature also favors its use in programs made to digital television, which demands a high quality image. Many digital matting algorithms use markings done on the images, to demarcate what is the foreground, the background and the uncertainty areas. This marking is called trimap, which is a triple map containing these three informations. The trimap is done, typically, from manual markings. In this project, methods were created that can be used in digital matting algorithms, with restriction of time and without human interaction, that is, the creation of an algorithm that generates the trimap automatically. This last one can be generated from the difference between a color of an arbitrary background and the foreground, or by using a depth map. It was also created a matting method, based on the Geodesic Matting (BAI; SAPIRO, 2009), which has an inferior processing time then the original one. Aiming to improve the performance of the applications that generates the trimap and of the algorithms that generates the alphamap (map that associates a value to the transparency of each pixel of the image), allowing its use in applications with time restrictions, it was used the CUDA architecture. Taking advantage, this way, of the computational power and the features of the GPGPU, which is massively parallel
Resumo:
In questa tesi si descrive il lavoro svolto presso l’istituto INFN-CNAF, che consiste nello sviluppo di un’applicazione parallela e del suo utilizzo su di un’architettura a basso consumo, allo scopo di valutare il comportamento della stessa, confrontandolo a quello di architetture ad alta potenza di calcolo. L’architettura a basso consumo utilizzata `e un system on chip mutuato dal mondo mobile e embedded contenente una cpu ARM quad core e una GPU NVIDIA, mentre l’architettura ad alta potenza di calcolo `e un sistema x86 64 con una GPU NVIDIA di classe server. L’applicazione `e stata sviluppata in C++ in due differenti versioni: la prima utilizzando l’estensione OpenMP e la seconda utilizzando l’estensione CUDA. Queste due versioni hanno permesso di valutare il comportamento dell’architettura a basso consumo sotto diversi punti di vista, utilizzando nelle differenti versioni dell’applicazione la CPU o la GPU come unita` principale di elaborazione.
Resumo:
The mechanical action of the heart is made possible in response to electrical events that involve the cardiac cells, a property that classifies the heart tissue between the excitable tissues. At the cellular level, the electrical event is the signal that triggers the mechanical contraction, inducing a transient increase in intracellular calcium which, in turn, carries the message of contraction to the contractile proteins of the cell. The primary goal of my project was to implement in CUDA (Compute Unified Device Architecture, an hardware architecture for parallel processing created by NVIDIA) a tissue model of the rabbit sinoatrial node to evaluate the heterogeneity of its structure and how that variability influences the behavior of the cells. In particular, each cell has an intrinsic discharge frequency, thus different from that of every other cell of the tissue and it is interesting to study the process of synchronization of the cells and look at the value of the last discharge frequency if they synchronized.
Resumo:
En el presente documento se hablará acerca del desarrollo de un proyecto para la mejora de un programa de análisis de señales; con ese fin, se hará uso de técnicas de optimización del software y de tecnologías de aceleración, mediante el aprovechamiento del paralelismo del programa. Además se hará un análisis de acerca del uso de dos tecnologías basadas en diferentes paradigmas de programación paralela; una mediante múltiples hilos con memoria compartida y la otra mediante el uso de GPUs como dispositivos de coprocesamiento. This paper will talk about the development of a Project to improve a program that does signals analysis; to that end, it will make use of software optimization techniques and acceleration technologies by exploiting parallelism in the program. In Addition will be done an analysis on the use of two technologies based on two different paradigms; one using multiple threads with shared memory and the other using GPU as co-processing devices.
Resumo:
A spatial-color-based non-parametric background-foreground modeling strategy in a GPGPU by using CUDA is proposed. This strategy is suitable for augmented-reality applications, providing real-time high-quality results in a great variety of scenarios.
Resumo:
El objetivo de este proyecto es evaluar la mejora de rendimiento que aporta la paralelización de algoritmos de procesamiento de imágenes, para su ejecución en una tarjeta gráfica. Para ello, una vez seleccionados los algoritmos a estudio, fueron desarrollados en lenguaje C++ bajo el paradigma secuencial. A continuación, tomando como base estas implementaciones, se paralelizaron siguiendo las directivas de la tecnología CUDA (Compute Unified Device Architecture) desarrollada por NVIDIA. Posteriormente, se desarrolló un interfaz gráfico de usuario en Visual C#, para una utilización más sencilla de la herramienta. Por último, se midió el rendimiento de cada uno de los algoritmos, en términos de tiempo de ejecución paralela y speedup, mediante el procesamiento de una serie de imágenes de distintos tamaños.---ABSTRACT---The aim of this Project is to evaluate the performance improvement provided by the parallelization of image processing algorithms, which will be executed on a graphics processing unit. In order to do this, once the algorithms to study were selected, each of them was developed in C++ under sequential paradigm. Then, based on these implementations, these algorithms were implemented using the compute unified device architecture (CUDA) programming model provided by NVIDIA. After that, a graphical user interface (GUI) was developed to increase application’s usability. Finally, performance of each algorithm was measured in terms of parallel execution time and speedup by processing a set of images of different sizes.
Resumo:
Partial differential equation (PDE) solvers are commonly employed to study and characterize the parameter space for reaction-diffusion (RD) systems while investigating biological pattern formation. Increasingly, biologists wish to perform such studies with arbitrary surfaces representing ‘real’ 3D geometries for better insights. In this paper, we present a highly optimized CUDA-based solver for RD equations on triangulated meshes in 3D. We demonstrate our solver using a chemotactic model that can be used to study snakeskin pigmentation, for example. We employ a finite element based approach to perform explicit Euler time integrations. We compare our approach to a naive GPU implementation and provide an in-depth performance analysis, demonstrating the significant speedup afforded by our optimizations. The optimization strategies that we exploit could be generalized to other mesh based processing applications with PDE simulations.
Resumo:
Femtosecond laser microfabrication has emerged over the last decade as a 3D flexible technology in photonics. Numerical simulations provide an important insight into spatial and temporal beam and pulse shaping during the course of extremely intricate nonlinear propagation (see e.g. [1,2]). Electromagnetics of such propagation is typically described in the form of the generalized Non-Linear Schrdinger Equation (NLSE) coupled with Drude model for plasma [3]. In this paper we consider a multi-threaded parallel numerical solution for a specific model which describes femtosecond laser pulse propagation in transparent media [4, 5]. However our approach can be extended to similar models. The numerical code is implemented in NVIDIA Graphics Processing Unit (GPU) which provides an effitient hardware platform for multi-threded computing. We compare the performance of the described below parallel code implementated for GPU using CUDA programming interface [3] with a serial CPU version used in our previous papers [4,5]. © 2011 IEEE.
Resumo:
Hoje em dia, a área de codificação de dados é transversal a diversos tipos de engenharias devido à sua grande importância. Com o aumento exponencial na criação de dados digitais, o campo da compressão de dados ganhou uma grande visibilidade nesta área. São constantemente desenvolvidos e melhorados algoritmos de compressão por forma a obter a maior compressão de dados possível seja com ou sem perda de dados, permitindo sustentar o rápido e constante crescimento dos mesmos. Um dos grandes problemas deste tipo de algoritmos deve-se ao grande poder computacional que por vezes é necessário para obter uma boa taxa de compressão mantendo a qualidade dos dados quando descompactados. Este documento descreve uma estratégia para tentar reduzir o impacto do poder computacional necessário à codificação de imagens utilizando uma implementação heterogénea. O objetivo é tentar efetuar a paralelização das secções que requerem elevado poder computacional reduzindo assim o tempo necessário à compressão de dados. Este documento baseia-se na implementação desta estratégia para o algoritmo de codificação de imagens MMP-Intra. Utilizando inicialmente uma análise teórica, demonstramos que é viável efetuar a paralelização do algoritmo, sendo possível obter elevados ganhos de desempenho. Por forma a provar que o algoritmo MMP-Intra era paralelizavel e identificar os ganhos reais foi desenvolvido um protótipo inicial, o qual obteve um desempenho muito inferiore ao do algoritmo original, necessitando de muito mais tempo para obter os mesmo resultados. Utilizando um processo de otimização iterativo o protótipo passou por várias etapas de refinação. O protótipo refinado final obteve resultados muito superiores ao algoritmo sequencial no qual o mesmo foi baseado chegando a obter desempenhos quatro vezes superior ao original.
Resumo:
Eschewing costly high-tech approaches, this paper looks at the experience of using low-tech approaches to game design assignments as problem based learning and assessment tool over a number of years in undergraduate teaching. General game design concepts are discussed, along with learning outcomes and assessment rubrics in line with Blooms Taxonomy based on evidence from students who had no prior experience of serious game play or design. Approaches to creating game design based assessments are offered.
Resumo:
The astonishing development of diverse and different hardware platforms is twofold: on one side, the challenge for the exascale performance for big data processing and management; on the other side, the mobile and embedded devices for data collection and human machine interaction. This drove to a highly hierarchical evolution of programming models. GVirtuS is the general virtualization system developed in 2009 and firstly introduced in 2010 enabling a completely transparent layer among GPUs and VMs. This paper shows the latest achievements and developments of GVirtuS, now supporting CUDA 6.5, memory management and scheduling. Thanks to the new and improved remoting capabilities, GVirtus now enables GPU sharing among physical and virtual machines based on x86 and ARM CPUs on local workstations,computing clusters and distributed cloud appliances.