91 resultados para CUDA
Resumo:
X-ray computed log tomography has always been applied for qualitative reconstructions. In most cases, a series of consecutive slices of the timber are scanned to estimate the 3D image reconstruction of the entire log. However, the unexpected movement of the timber under study influences the quality of image reconstruction since the position and orientation of some scanned slices can be incorrectly estimated. In addition, the reconstruction time remains a significant challenge for practical applications. The present study investigates the possibility to employ modern physics engines for the problem of estimating the position of a moving rigid body and its scanned slices which are subject to X-ray computed tomography. The current work includes implementations of the extended Kalman filter and an algebraic reconstruction method for fan-bean computer tomography. In addition, modern techniques such as NVidia PhysX and CUDA are used in current study. As the result, it is numerically shown that it is possible to apply the extended Kalman filter together with a real-time physics engine, known as PhysX, in order to determine the position of a moving object. It is shown that the position of the rigid body can be determined based only on reconstructions of its slices. However, the simulation of the body movement sometimes is subject to an error during Kalman filter employment as PhysX is not always able to continue simulating the movement properly because of incorrect state estimation.
Resumo:
This article documents the addition of 512 microsatellite marker loci and nine pairs of Single Nucleotide Polymorphism (SNP) sequencing primers to the Molecular Ecology Resources Database. Loci were developed for the following species: Alcippe morrisonia morrisonia, Bashania fangiana, Bashania fargesii, Chaetodon vagabundus, Colletes floralis, Coluber constrictor flaviventris, Coptotermes gestroi, Crotophaga major, Cyprinella lutrensis, Danaus plexippus, Fagus grandifolia, Falco tinnunculus, Fletcherimyia fletcheri, Hydrilla verticillata, Laterallus jamaicensis coturniculus, Leavenworthia alabamica, Marmosops incanus, Miichthys miiuy, Nasua nasua, Noturus exilis, Odontesthes bonariensis, Quadrula fragosa, Pinctada maxima, Pseudaletia separata, Pseudoperonospora cubensis, Podocarpus elatus, Portunus trituberculatus, Rhagoletis cerasi, Rhinella schneideri, Sarracenia alata, Skeletonema marinoi, Sminthurus viridis, Syngnathus abaster, Uroteuthis (Photololigo) chinensis, Verticillium dahliae, Wasmannia auropunctata, and Zygochlamys patagonica. These loci were cross-tested on the following species: Chaetodon baronessa, Falco columbarius, Falco eleonorae, Falco naumanni, Falco peregrinus, Falco subbuteo, Didelphis aurita, Gracilinanus microtarsus, Marmosops paulensis, Monodelphis Americana, Odontesthes hatcheri, Podocarpus grayi, Podocarpus lawrencei, Podocarpus smithii, Portunus pelagicus, Syngnathus acus, Syngnathus typhle,Uroteuthis (Photololigo) edulis, Uroteuthis (Photololigo) duvauceli and Verticillium albo-atrum. This article also documents the addition of nine sequencing primer pairs and sixteen allele specific primers or probes for Oncorhynchus mykiss and Oncorhynchus tshawytscha; these primers and assays were cross-tested in both species.
Resumo:
Large-scale simulations of parts of the brain using detailed neuronal models to improve our understanding of brain functions are becoming a reality with the usage of supercomputers and large clusters. However, the high acquisition and maintenance cost of these computers, including the physical space, air conditioning, and electrical power, limits the number of simulations of this kind that scientists can perform. Modern commodity graphical cards, based on the CUDA platform, contain graphical processing units (GPUs) composed of hundreds of processors that can simultaneously execute thousands of threads and thus constitute a low-cost solution for many high-performance computing applications. In this work, we present a CUDA algorithm that enables the execution, on multiple GPUs, of simulations of large-scale networks composed of biologically realistic Hodgkin-Huxley neurons. The algorithm represents each neuron as a CUDA thread, which solves the set of coupled differential equations that model each neuron. Communication among neurons located in different GPUs is coordinated by the CPU. We obtained speedups of 40 for the simulation of 200k neurons that received random external input and speedups of 9 for a network with 200k neurons and 20M neuronal connections, in a single computer with two graphic boards with two GPUs each, when compared with a modern quad-core CPU. Copyright (C) 2010 John Wiley & Sons, Ltd.
Resumo:
With the growth of energy consumption worldwide, conventional reservoirs, the reservoirs called "easy exploration and production" are not meeting the global energy demand. This has led many researchers to develop projects that will address these needs, companies in the oil sector has invested in techniques that helping in locating and drilling wells. One of the techniques employed in oil exploration process is the reverse time migration (RTM), in English, Reverse Time Migration, which is a method of seismic imaging that produces excellent image of the subsurface. It is algorithm based in calculation on the wave equation. RTM is considered one of the most advanced seismic imaging techniques. The economic value of the oil reserves that require RTM to be localized is very high, this means that the development of these algorithms becomes a competitive differentiator for companies seismic processing. But, it requires great computational power, that it still somehow harms its practical success. The objective of this work is to explore the implementation of this algorithm in unconventional architectures, specifically GPUs using the CUDA by making an analysis of the difficulties in developing the same, as well as the performance of the algorithm in the sequential and parallel version
Resumo:
The vascular segmentation is important in diagnosing vascular diseases like stroke and is hampered by noise in the image and very thin vessels that can pass unnoticed. One way to accomplish the segmentation is extracting the centerline of the vessel with height ridges, which uses the intensity as features for segmentation. This process can take from seconds to minutes, depending on the current technology employed. In order to accelerate the segmentation method proposed by Aylward [Aylward & Bullitt 2002] we have adapted it to run in parallel using CUDA architecture. The performance of the segmentation method running on GPU is compared to both the same method running on CPU and the original Aylward s method running also in CPU. The improvemente of the new method over the original one is twofold: the starting point for the segmentation process is not a single point in the blood vessel but a volume, thereby making it easier for the user to segment a region of interest, and; the overall gain method was 873 times faster running on GPU and 150 times more fast running on the CPU than the original CPU in Aylward
Resumo:
This article documents the addition of 512 microsatellite marker loci and nine pairs of Single Nucleotide Polymorphism (SNP) sequencing primers to the Molecular Ecology Resources Database. Loci were developed for the following species: Alcippe morrisonia morrisonia, Bashania fangiana, Bashania fargesii, Chaetodon vagabundus, Colletes floralis, Coluber constrictor flaviventris, Coptotermes gestroi, Crotophaga major, Cyprinella lutrensis, Danaus plexippus, Fagus grandifolia, Falco tinnunculus, Fletcherimyia fletcheri, Hydrilla verticillata, Laterallus jamaicensis coturniculus, Leavenworthia alabamica, Marmosops incanus, Miichthys miiuy, Nasua nasua, Noturus exilis, Odontesthes bonariensis, Quadrula fragosa, Pinctada maxima, Pseudaletia separata, Pseudoperonospora cubensis, Podocarpus elatus, Portunus trituberculatus, Rhagoletis cerasi, Rhinella schneideri, Sarracenia alata, Skeletonema marinoi, Sminthurus viridis, Syngnathus abaster, Uroteuthis (Photololigo) chinensis, Verticillium dahliae, Wasmannia auropunctata, and Zygochlamys patagonica. These loci were cross-tested on the following species: Chaetodon baronessa, Falco columbarius, Falco eleonorae, Falco naumanni, Falco peregrinus, Falco subbuteo, Didelphis aurita, Gracilinanus microtarsus, Marmosops paulensis, Monodelphis Americana, Odontesthes hatcheri, Podocarpus grayi, Podocarpus lawrencei, Podocarpus smithii, Portunus pelagicus, Syngnathus acus, Syngnathus typhle,Uroteuthis (Photololigo) edulis, Uroteuthis (Photololigo) duvauceli and Verticillium albo-atrum. This article also documents the addition of nine sequencing primer pairs and sixteen allele specific primers or probes for Oncorhynchus mykiss and Oncorhynchus tshawytscha; these primers and assays were cross-tested in both species.
Resumo:
In this article we explore the NVIDIA graphical processing units (GPU) computational power in cryptography using CUDA (Compute Unified Device Architecture) technology. CUDA makes the general purpose computing easy using the parallel processing presents in GPUs. To do this, the NVIDIA GPUs architectures and CUDA are presented, besides cryptography concepts. Furthermore, we do the comparison between the versions executed in CPU with the parallel version of the cryptography algorithms Advanced Encryption Standard (AES) and Message-digest Algorithm 5 (MD5) wrote in CUDA. © 2011 AISTI.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Pós-graduação em Ciência da Computação - IBILCE
Resumo:
Identify opportunities for software parallelism is a task that takes a lot of human time, but once some code patterns for parallelism are identified, a software could quickly accomplish this task. Thus, automating this process brings many benefits such as saving time and reducing errors caused by the programmer [1]. This work aims at developing a software environment that identifies opportunities for parallelism in a source code written in C language, and generates a program with the same behavior, but with higher degree of parallelism, compatible with a graphics processor compatible with CUDA architecture.
Resumo:
Pós-graduação em Ciência e Tecnologia de Materiais - FC
Resumo:
Técnicas de reconhecimento de padrões tem como principal objetivo classificar um conjunto de amostras, sendo o processo de aprendizado a fase de maior consumo de tempo. O problema pode piorar em ferramentas de classificação interativas, o que pode ser inaceitável para grandes bases de dados. Um exemplo de classificador é o baseado em Floresta de Caminhos Ótimos [8] - OPF. Dado que muitos trabalhos tem sido orientados à implementação de algoritmos de reconhecimento de padrões em ambiente General Purpose Graphics Processing Unit - GPGPU, o presente estudo objetivou a implementação da etapa de treinamento do classificador Floresta de Caminhos Ótimos em CUDA, visando aumentar a sua eficiência. A otimização do classificador em CUDA demonstrou uma fase de treinamento mais rápida que a versão original.
Resumo:
The modern GPUs are well suited for intensive computational tasks and massive parallel computation. Sparse matrix multiplication and linear triangular solver are the most important and heavily used kernels in scientific computation, and several challenges in developing a high performance kernel with the two modules is investigated. The main interest it to solve linear systems derived from the elliptic equations with triangular elements. The resulting linear system has a symmetric positive definite matrix. The sparse matrix is stored in the compressed sparse row (CSR) format. It is proposed a CUDA algorithm to execute the matrix vector multiplication using directly the CSR format. A dependence tree algorithm is used to determine which variables the linear triangular solver can determine in parallel. To increase the number of the parallel threads, a coloring graph algorithm is implemented to reorder the mesh numbering in a pre-processing phase. The proposed method is compared with parallel and serial available libraries. The results show that the proposed method improves the computation cost of the matrix vector multiplication. The pre-processing associated with the triangular solver needs to be executed just once in the proposed method. The conjugate gradient method was implemented and showed similar convergence rate for all the compared methods. The proposed method showed significant smaller execution time.
Resumo:
Máster Universitario en Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)
Resumo:
Máster Universitario en Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)