106 resultados para NVIDIA CUDA
Resumo:
A constante evolução da tecnologia disponibilizou, atualmente, ferramentas computacionais que eram apenas expectativas há 10 anos atrás. O aumento do potencial computacional aplicado a modelos numéricos que simulam a atmosfera permitiu ampliar o estudo de fenômenos atmosféricos, através do uso de ferramentas de computação de alto desempenho. O trabalho propôs o desenvolvimento de algoritmos com base em arquiteturas SIMT e aplicação de técnicas de paralelismo com uso da ferramenta OpenACC para processamento de dados de previsão numérica do modelo Weather Research and Forecast. Esta proposta tem forte conotação interdisciplinar, buscando a interação entre as áreas de modelagem atmosférica e computação científica. Foram testadas a influência da computação do cálculo de microfísica de nuvens na degradação temporal do modelo. Como a entrada de dados para execução na GPU não era suficientemente grande, o tempo necessário para transferir dados da CPU para a GPU foi maior do que a execução da computação na CPU. Outro fator determinante foi a adição de código CUDA dentro de um contexto MPI, causando assim condições de disputa de recursos entre os processadores, mais uma vez degradando o tempo de execução. A proposta do uso de diretivas para aplicar computação de alto desempenho em uma estrutura CUDA parece muito promissora, mas ainda precisa ser utilizada com muita cautela a fim de produzir bons resultados. A construção de um híbrido MPI + CUDA foi testada, mas os resultados não foram conclusivos.
Resumo:
The high performance computing community has traditionally focused uniquely on the reduction of execution time, though in the last years, the optimization of energy consumption has become a main issue. A reduction of energy usage without a degradation of performance requires the adoption of energy-efficient hardware platforms accompanied by the development of energy-aware algorithms and computational kernels. The solution of linear systems is a key operation for many scientific and engineering problems. Its relevance has motivated an important amount of work, and consequently, it is possible to find high performance solvers for a wide variety of hardware platforms. In this work, we aim to develop a high performance and energy-efficient linear system solver. In particular, we develop two solvers for a low-power CPU-GPU platform, the NVIDIA Jetson TK1. These solvers implement the Gauss-Huard algorithm yielding an efficient usage of the target hardware as well as an efficient memory access. The experimental evaluation shows that the novel proposal reports important savings in both time and energy-consumption when compared with the state-of-the-art solvers of the platform.
Resumo:
The recently reported Monte Carlo Random Path Sampling method (RPS) is here improved and its application is expanded to the study of the 2D and 3D Ising and discrete Heisenberg models. The methodology was implemented to allow use in both CPU-based high-performance computing infrastructures (C/MPI) and GPU-based (CUDA) parallel computation, with significant computational performance gains. Convergence is discussed, both in terms of free energy and magnetization dependence on field/temperature. From the calculated magnetization-energy joint density of states, fast calculations of field and temperature dependent thermodynamic properties are performed, including the effects of anisotropy on coercivity, and the magnetocaloric effect. The emergence of first-order magneto-volume transitions in the compressible Ising model is interpreted using the Landau theory of phase transitions. Using metallic Gadolinium as a real-world example, the possibility of using RPS as a tool for computational magnetic materials design is discussed. Experimental magnetic and structural properties of a Gadolinium single crystal are compared to RPS-based calculations using microscopic parameters obtained from Density Functional Theory.
Resumo:
Heterogeneous computing systems have become common in modern processor architectures. These systems, such as those released by AMD, Intel, and Nvidia, include both CPU and GPU cores on a single die available with reduced communication overhead compared to their discrete predecessors. Currently, discrete CPU/GPU systems are limited, requiring larger, regular, highly-parallel workloads to overcome the communication costs of the system. Without the traditional communication delay assumed between GPUs and CPUs, we believe non-traditional workloads could be targeted for GPU execution. Specifically, this thesis focuses on the execution model of nested parallel workloads on heterogeneous systems. We have designed a simulation flow which utilizes widely used CPU and GPU simulators to model heterogeneous computing architectures. We then applied this simulator to non-traditional GPU workloads using different execution models. We also have proposed a new execution model for nested parallelism allowing users to exploit these heterogeneous systems to reduce execution time.
Resumo:
Abstract: Medical image processing in general and brain image processing in particular are computationally intensive tasks. Luckily, their use can be liberalized by means of techniques such as GPU programming. In this article we study NiftyReg, a brain image processing library with a GPU implementation using CUDA, and analyse different possible ways of further optimising the existing codes. We will focus on fully using the memory hierarchy and on exploiting the computational power of the CPU. The ideas that lead us towards the different attempts to change and optimize the code will be shown as hypotheses, which we will then test empirically using the results obtained from running the application. Finally, for each set of related optimizations we will study the validity of the obtained results in terms of both performance and the accuracy of the resulting images.
Resumo:
The research described in this thesis was motivated by the need of a robust model capable of representing 3D data obtained with 3D sensors, which are inherently noisy. In addition, time constraints have to be considered as these sensors are capable of providing a 3D data stream in real time. This thesis proposed the use of Self-Organizing Maps (SOMs) as a 3D representation model. In particular, we proposed the use of the Growing Neural Gas (GNG) network, which has been successfully used for clustering, pattern recognition and topology representation of multi-dimensional data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models, without considering time constraints. It is proposed a hardware implementation leveraging the computing power of modern GPUs, which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). The proposed methods were applied to different problem and applications in the area of computer vision such as the recognition and localization of objects, visual surveillance or 3D reconstruction.
Resumo:
FEA simulation of thermal metal cutting is central to interactive design and manufacturing. It is therefore relevant to assess the applicability of FEA open software to simulate 2D heat transfer in metal sheet laser cuts. Application of open source code (e.g. FreeFem++, FEniCS, MOOSE) makes possible additional scenarios (e.g. parallel, CUDA, etc.), with lower costs. However, a precise assessment is required on the scenarios in which open software can be a sound alternative to a commercial one. This article contributes in this regard, by presenting a comparison of the aforementioned freeware FEM software for the simulation of heat transfer in thin (i.e. 2D) sheets, subject to a gliding laser point source. We use the commercial ABAQUS software as the reference to compare such open software. A convective linear thin sheet heat transfer model, with and without material removal is used. This article does not intend a full design of computer experiments. Our partial assessment shows that the thin sheet approximation turns to be adequate in terms of the relative error for linear alumina sheets. Under mesh resolutions better than 10e−5 m , the open and reference software temperature differ in at most 1 % of the temperature prediction. Ongoing work includes adaptive re-meshing, nonlinearities, sheet stress analysis and Mach (also called ‘relativistic’) effects.
Resumo:
Nowadays, new computers generation provides a high performance that enables to build computationally expensive computer vision applications applied to mobile robotics. Building a map of the environment is a common task of a robot and is an essential part to allow the robots to move through these environments. Traditionally, mobile robots used a combination of several sensors from different technologies. Lasers, sonars and contact sensors have been typically used in any mobile robotic architecture, however color cameras are an important sensor due to we want the robots to use the same information that humans to sense and move through the different environments. Color cameras are cheap and flexible but a lot of work need to be done to give robots enough visual understanding of the scenes. Computer vision algorithms are computational complex problems but nowadays robots have access to different and powerful architectures that can be used for mobile robotics purposes. The advent of low-cost RGB-D sensors like Microsoft Kinect which provide 3D colored point clouds at high frame rates made the computer vision even more relevant in the mobile robotics field. The combination of visual and 3D data allows the systems to use both computer vision and 3D processing and therefore to be aware of more details of the surrounding environment. The research described in this thesis was motivated by the need of scene mapping. Being aware of the surrounding environment is a key feature in many mobile robotics applications from simple robotic navigation to complex surveillance applications. In addition, the acquisition of a 3D model of the scenes is useful in many areas as video games scene modeling where well-known places are reconstructed and added to game systems or advertising where once you get the 3D model of one room the system can add furniture pieces using augmented reality techniques. In this thesis we perform an experimental study of the state-of-the-art registration methods to find which one fits better to our scene mapping purposes. Different methods are tested and analyzed on different scene distributions of visual and geometry appearance. In addition, this thesis proposes two methods for 3d data compression and representation of 3D maps. Our 3D representation proposal is based on the use of Growing Neural Gas (GNG) method. This Self-Organizing Maps (SOMs) has been successfully used for clustering, pattern recognition and topology representation of various kind of data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models without considering time constraints. Self-organising neural models have the ability to provide a good representation of the input space. In particular, the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time consuming, specially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This thesis proposes a hardware implementation leveraging the computing power of modern GPUs which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). Our proposed geometrical 3D compression method seeks to reduce the 3D information using plane detection as basic structure to compress the data. This is due to our target environments are man-made and therefore there are a lot of points that belong to a plane surface. Our proposed method is able to get good compression results in those man-made scenarios. The detected and compressed planes can be also used in other applications as surface reconstruction or plane-based registration algorithms. Finally, we have also demonstrated the goodness of the GPU technologies getting a high performance implementation of a CAD/CAM common technique called Virtual Digitizing.
Resumo:
Dissertação (mestrado)–Universidade de Brasília, Universidade UnB de Planaltina, Programa de Pós-Graduação em Ciência de Materiais, 2015.
Resumo:
Visualization of vector fields plays an important role in research activities nowadays -- Web applications allow a fast, multi-platform and multi-device access to data, which results in the need of optimized applications to be implemented in both high-performance and low-performance devices -- Point trajectory calculation procedures usually perform repeated calculations due to the fact that several points might lie over the same trajectory -- This paper presents a new methodology to calculate point trajectories over highly-dense and uniformly-distributed grid of points in which the trajectories are forced to lie over the points in the grid -- Its advantages rely on a highly parallel computing architecture implementation and in the reduction of the computational effort to calculate the stream paths since unnecessary calculations are avoided, reusing data through iterations -- As case study, the visualization of oceanic currents through in the web platform is presented and analyzed, using WebGL as the parallel computing architecture and the rendering Application Programming Interface
Resumo:
En el campo de la medicina clínica es crucial poder determinar la seguridad y la eficacia de los fármacos actuales y además acelerar el descubrimiento de nuevos compuestos activos. Para ello se llevan a cabo ensayos de laboratorio, que son métodos muy costosos y que requieren mucho tiempo. Sin embargo, la bioinformática puede facilitar enormemente la investigación clínica para los fines mencionados, ya que proporciona la predicción de la toxicidad de los fármacos y su actividad en enfermedades nuevas, así como la evolución de los compuestos activos descubiertos en ensayos clínicos. Esto se puede lograr gracias a la disponibilidad de herramientas de bioinformática y métodos de cribado virtual por ordenador (CV) que permitan probar todas las hipótesis necesarias antes de realizar los ensayos clínicos, tales como el docking estructural, mediante el programa BINDSURF. Sin embargo, la precisión de la mayoría de los métodos de CV se ve muy restringida a causa de las limitaciones presentes en las funciones de afinidad o scoring que describen las interacciones biomoleculares, e incluso hoy en día estas incertidumbres no se conocen completamente. En este trabajo abordamos este problema, proponiendo un nuevo enfoque en el que las redes neuronales se entrenan con información relativa a bases de datos de compuestos conocidos (proteínas diana y fármacos), y se aprovecha después el método para incrementar la precisión de las predicciones de afinidad del método de CV BINDSURF.
Resumo:
This paper presents the implementation of a high quality real-time 3D video system intended for 3D videoconferencing -- Basically, the system is able to extract depth information from a pair of images coming from a short-baseline camera setup -- The system is based on the use of a variant of the adaptive support-weight algorithm to be applied on GPU-based architectures -- The reason to do it is to get real-time results without compromising accuracy and also to reduce costs by using commodity hardware -- The complete system runs over the GStreamer multimedia software platform to make it even more flexible -- Moreover, an autoestereoscopic display has been used as the end-up terminal for 3D content visualization
Resumo:
En esta tesis doctoral se exponen los fundamentos teóricos necesarios en el diseño de esquemas numéricos de volúmenes finitos para sistemas hiperbólicos no conservativos de una y dos dimensiones. Para el caso unidimensional se repasan los conceptos de esquema camino-conservativo y esquema bien equilibrado, así como la extensión de los esquemas numéricos a alto orden, basados en la reconstrucción de estados. En particular, se presentan los esquemas de tipo PVM (Polynomial Viscosity Matrix), así como diversos esquemas de limitadores de flujo que resultan de la extensión natural del método WAF, utilizando como base algunos esquemas de tipo PVM. Para el caso bidimensional se aborda el diseño de esquemas numéricos camino-conservativos y bien equilibrados de volúmenes finitos para sistemas hiperbólicos no conservativos y su extensión a alto orden, en particular se presenta una reconstrucción de estados de tercer orden compacta y que resulta de la combinación WENO de paraboloides y planos. Se presenta además el desarrollo de métodos numéricos para el sistema de aguas someras bidimensional de una capa. En particular se definen esquemas de primer orden de tipo HLL y FORCE y su extensión a alto orden, un método de limitadores de flujo basado en el esquema HLL-WAF, así como su implementación en arquitecturas de tipo GPU, usando el entorno de programación CUDA. A continuación, se presenta un esquema numérico de orden uno para el sistema de aguas someras de una capa bidimensional en coordenadas esféricas (longitud/latitud), así como la extensión natural del método de limitadores de flujo presentado en el Capítulo 3 a este sistema. Finalmente, se presenta la validación del esquema de limitadores de flujo mediante la simulación de tsunamis reales, y la comparación con datos de campo.
Resumo:
With the growth of energy consumption worldwide, conventional reservoirs, the reservoirs called "easy exploration and production" are not meeting the global energy demand. This has led many researchers to develop projects that will address these needs, companies in the oil sector has invested in techniques that helping in locating and drilling wells. One of the techniques employed in oil exploration process is the reverse time migration (RTM), in English, Reverse Time Migration, which is a method of seismic imaging that produces excellent image of the subsurface. It is algorithm based in calculation on the wave equation. RTM is considered one of the most advanced seismic imaging techniques. The economic value of the oil reserves that require RTM to be localized is very high, this means that the development of these algorithms becomes a competitive differentiator for companies seismic processing. But, it requires great computational power, that it still somehow harms its practical success. The objective of this work is to explore the implementation of this algorithm in unconventional architectures, specifically GPUs using the CUDA by making an analysis of the difficulties in developing the same, as well as the performance of the algorithm in the sequential and parallel version
Resumo:
Image and video compression play a major role in the world today, allowing the storage and transmission of large multimedia content volumes. However, the processing of this information requires high computational resources, hence the improvement of the computational performance of these compression algorithms is very important. The Multidimensional Multiscale Parser (MMP) is a pattern-matching-based compression algorithm for multimedia contents, namely images, achieving high compression ratios, maintaining good image quality, Rodrigues et al. [2008]. However, in comparison with other existing algorithms, this algorithm takes some time to execute. Therefore, two parallel implementations for GPUs were proposed by Ribeiro [2016] and Silva [2015] in CUDA and OpenCL-GPU, respectively. In this dissertation, to complement the referred work, we propose two parallel versions that run the MMP algorithm in CPU: one resorting to OpenMP and another that converts the existing OpenCL-GPU into OpenCL-CPU. The proposed solutions are able to improve the computational performance of MMP by 3 and 2:7 , respectively. The High Efficiency Video Coding (HEVC/H.265) is the most recent standard for compression of image and video. Its impressive compression performance, makes it a target for many adaptations, particularly for holoscopic image/video processing (or light field). Some of the proposed modifications to encode this new multimedia content are based on geometry-based disparity compensations (SS), developed by Conti et al. [2014], and a Geometric Transformations (GT) module, proposed by Monteiro et al. [2015]. These compression algorithms for holoscopic images based on HEVC present an implementation of specific search for similar micro-images that is more efficient than the one performed by HEVC, but its implementation is considerably slower than HEVC. In order to enable better execution times, we choose to use the OpenCL API as the GPU enabling language in order to increase the module performance. With its most costly setting, we are able to reduce the GT module execution time from 6.9 days to less then 4 hours, effectively attaining a speedup of 45 .