91 resultados para CUDA
Resumo:
En aquest projecte final de carrera es decriurà el procés realitzat per tal d'aconseguir que l'aplicació reacTICision aprofiti les capacitats de la computació paral·lela en GPU mitjançant la tecnologia CUDA de NVIDIA. Amb aquest objectiu es realitzarà un estudi de la tecnologia CUDA i el funcionament de reacTIVision així com un anàlisi dels resultats obtinguts
Resumo:
Análisis de desarrollo paralelo CUDA en lenguajes Java y Python, utilizando JCuda, RootBeer, PyCuda y Anaconda Accelerate. Cómo desarrollar, pros y contras de las herramientas analizadas.
Resumo:
Análisis de desarrollo paralelo CUDA en lenguajes Java y Python, utilizando JCuda, RootBeer, PyCuda y Anaconda Accelerate.
Resumo:
Genetic Programming (GP) is a widely used methodology for solving various computational problems. GP's problem solving ability is usually hindered by its long execution times. In this thesis, GP is applied toward real-time computer vision. In particular, object classification and tracking using a parallel GP system is discussed. First, a study of suitable GP languages for object classification is presented. Two main GP approaches for visual pattern classification, namely the block-classifiers and the pixel-classifiers, were studied. Results showed that the pixel-classifiers generally performed better. Using these results, a suitable language was selected for the real-time implementation. Synthetic video data was used in the experiments. The goal of the experiments was to evolve a unique classifier for each texture pattern that existed in the video. The experiments revealed that the system was capable of correctly tracking the textures in the video. The performance of the system was on-par with real-time requirements.
Resumo:
Techniques of image combination, with extraction of objects to set a final scene, are very used in applications from photos montages to cinematographic productions. These techniques are called digital matting. With them is possible to decrease the cost of productions, because it is not necessary for the actor to be filmed in the location where the final scene occurs. This feature also favors its use in programs made to digital television, which demands a high quality image. Many digital matting algorithms use markings done on the images, to demarcate what is the foreground, the background and the uncertainty areas. This marking is called trimap, which is a triple map containing these three informations. The trimap is done, typically, from manual markings. In this project, methods were created that can be used in digital matting algorithms, with restriction of time and without human interaction, that is, the creation of an algorithm that generates the trimap automatically. This last one can be generated from the difference between a color of an arbitrary background and the foreground, or by using a depth map. It was also created a matting method, based on the Geodesic Matting (BAI; SAPIRO, 2009), which has an inferior processing time then the original one. Aiming to improve the performance of the applications that generates the trimap and of the algorithms that generates the alphamap (map that associates a value to the transparency of each pixel of the image), allowing its use in applications with time restrictions, it was used the CUDA architecture. Taking advantage, this way, of the computational power and the features of the GPGPU, which is massively parallel
Resumo:
The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molucular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing in the potential energy of the system in order to force the system to sample a specific region in the configurational space. Several N independent simulations are performed in order to sample all the region of interest. Subsequently, the WHAM algorithm is used to estimate the original system energy starting from the N atomic trajectories. The parallelization of WHAM has been performed through CUDA, a language that allows to work in GPUs of NVIDIA graphic cards, which have a parallel achitecture. The parallel implementation may sensibly speed up the WHAM execution compared to previous serial CPU imlementations. However, the WHAM CPU code presents some temporal criticalities to very high numbers of interactions. The algorithm has been written in C++ and executed in UNIX systems provided with NVIDIA graphic cards. The results were satisfying obtaining an increase of performances when the model was executed on graphics cards with compute capability greater. Nonetheless, the GPUs used to test the algorithm is quite old and not designated for scientific calculations. It is likely that a further performance increase will be obtained if the algorithm would be executed in clusters of GPU at high level of computational efficiency. The thesis is organized in the following way: I will first describe the mathematical formulation of Umbrella Sampling and WHAM algorithm with their apllications in the study of ionic channels and in Molecular Docking (Chapter 1); then, I will present the CUDA architectures used to implement the model (Chapter 2); and finally, the results obtained on model systems will be presented (Chapter 3).
Resumo:
En el presente documento se hablará acerca del desarrollo de un proyecto para la mejora de un programa de análisis de señales; con ese fin, se hará uso de técnicas de optimización del software y de tecnologías de aceleración, mediante el aprovechamiento del paralelismo del programa. Además se hará un análisis de acerca del uso de dos tecnologías basadas en diferentes paradigmas de programación paralela; una mediante múltiples hilos con memoria compartida y la otra mediante el uso de GPUs como dispositivos de coprocesamiento. This paper will talk about the development of a Project to improve a program that does signals analysis; to that end, it will make use of software optimization techniques and acceleration technologies by exploiting parallelism in the program. In Addition will be done an analysis on the use of two technologies based on two different paradigms; one using multiple threads with shared memory and the other using GPU as co-processing devices.
Resumo:
A spatial-color-based non-parametric background-foreground modeling strategy in a GPGPU by using CUDA is proposed. This strategy is suitable for augmented-reality applications, providing real-time high-quality results in a great variety of scenarios.
Resumo:
Partial differential equation (PDE) solvers are commonly employed to study and characterize the parameter space for reaction-diffusion (RD) systems while investigating biological pattern formation. Increasingly, biologists wish to perform such studies with arbitrary surfaces representing ‘real’ 3D geometries for better insights. In this paper, we present a highly optimized CUDA-based solver for RD equations on triangulated meshes in 3D. We demonstrate our solver using a chemotactic model that can be used to study snakeskin pigmentation, for example. We employ a finite element based approach to perform explicit Euler time integrations. We compare our approach to a naive GPU implementation and provide an in-depth performance analysis, demonstrating the significant speedup afforded by our optimizations. The optimization strategies that we exploit could be generalized to other mesh based processing applications with PDE simulations.
Resumo:
Hoje em dia, a área de codificação de dados é transversal a diversos tipos de engenharias devido à sua grande importância. Com o aumento exponencial na criação de dados digitais, o campo da compressão de dados ganhou uma grande visibilidade nesta área. São constantemente desenvolvidos e melhorados algoritmos de compressão por forma a obter a maior compressão de dados possível seja com ou sem perda de dados, permitindo sustentar o rápido e constante crescimento dos mesmos. Um dos grandes problemas deste tipo de algoritmos deve-se ao grande poder computacional que por vezes é necessário para obter uma boa taxa de compressão mantendo a qualidade dos dados quando descompactados. Este documento descreve uma estratégia para tentar reduzir o impacto do poder computacional necessário à codificação de imagens utilizando uma implementação heterogénea. O objetivo é tentar efetuar a paralelização das secções que requerem elevado poder computacional reduzindo assim o tempo necessário à compressão de dados. Este documento baseia-se na implementação desta estratégia para o algoritmo de codificação de imagens MMP-Intra. Utilizando inicialmente uma análise teórica, demonstramos que é viável efetuar a paralelização do algoritmo, sendo possível obter elevados ganhos de desempenho. Por forma a provar que o algoritmo MMP-Intra era paralelizavel e identificar os ganhos reais foi desenvolvido um protótipo inicial, o qual obteve um desempenho muito inferiore ao do algoritmo original, necessitando de muito mais tempo para obter os mesmo resultados. Utilizando um processo de otimização iterativo o protótipo passou por várias etapas de refinação. O protótipo refinado final obteve resultados muito superiores ao algoritmo sequencial no qual o mesmo foi baseado chegando a obter desempenhos quatro vezes superior ao original.
Resumo:
The astonishing development of diverse and different hardware platforms is twofold: on one side, the challenge for the exascale performance for big data processing and management; on the other side, the mobile and embedded devices for data collection and human machine interaction. This drove to a highly hierarchical evolution of programming models. GVirtuS is the general virtualization system developed in 2009 and firstly introduced in 2010 enabling a completely transparent layer among GPUs and VMs. This paper shows the latest achievements and developments of GVirtuS, now supporting CUDA 6.5, memory management and scheduling. Thanks to the new and improved remoting capabilities, GVirtus now enables GPU sharing among physical and virtual machines based on x86 and ARM CPUs on local workstations,computing clusters and distributed cloud appliances.
Resumo:
Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of software. The ability to carry out reverse-engineering of software is of great importance within the security and forensics elds, particularly when investigating malicious software or carrying out forensic analysis following a successful security breach. Due to the complexity of the Nvidia CUDA (Compute Uni ed Device Architecture) framework, it is not clear how best to approach the reverse engineering of a piece of CUDA software. We carry out a review of the di erent binary output formats which may be encountered from the CUDA compiler, and their implications on reverse engineering. We then demonstrate the process of carrying out disassembly of an example CUDA application, to establish the various techniques available to forensic investigators carrying out black-box disassembly and reverse engineering of CUDA binaries. We show that the Nvidia compiler, using default settings, leaks useful information. Finally, we demonstrate techniques to better protect intellectual property in CUDA algorithm implementations from reverse engineering.
Resumo:
The Nha Trang Bay (latitude 12°15'N) and central areas of Vietnam present strong ecological differences from other parts of the country.
Resumo:
En este documento se expondrá una implementación del problema del viajante de comercio usando una implementación personalizada de un mapa auto-organizado basándose en soluciones anteriores y adaptándolas a la arquitectura CUDA, haciendo a la vez una comparativa de la implementación eficiente en CUDA C/C++ con la implementación de las funciones de GPU incluidas en el Parallel Computing Toolbox de Matlab. La solución que se da reduce en casi un cuarto las iteraciones necesarias para llegar a una solución buena del problema mencionado, además de la mejora inminente del uso de las arquitecturas paralelas. En esta solución se estudia la mejora en tiempo que se consigue con el uso específico de la memoria compartida, siendo esta una de las herramientas más potentes para mejorar el rendimiento. En lo referente a los tiempos de ejecución, se llega a concluir que la mejor solución es el lanzamiento de un kernel de CUDA desde Matlab a través de la funcionalidad incluida en el Parallel Computing Toolbox.