246 resultados para virtualised GPU


Relevância:

10.00% 10.00%

Publicador:

Resumo:

A obtenção de imagens usando tomografia computadorizada revolucionou o diagnóstico de doenças na medicina e é usada amplamente em diferentes áreas da pesquisa científica. Como parte do processo de obtenção das imagens tomográficas tridimensionais um conjunto de radiografias são processadas por um algoritmo computacional, o mais usado atualmente é o algoritmo de Feldkamp, David e Kress (FDK). Os usos do processamento paralelo para acelerar os cálculos em algoritmos computacionais usando as diferentes tecnologias disponíveis no mercado têm mostrado sua utilidade para diminuir os tempos de processamento. No presente trabalho é apresentada a paralelização do algoritmo de reconstrução de imagens tridimensionais FDK usando unidades gráficas de processamento (GPU) e a linguagem CUDA-C. São apresentadas as GPUs como uma opção viável para executar computação paralela e abordados os conceitos introdutórios associados à tomografia computadorizada, GPUs, CUDA-C e processamento paralelo. A versão paralela do algoritmo FDK executada na GPU é comparada com uma versão serial do mesmo, mostrando maior velocidade de processamento. Os testes de desempenho foram feitos em duas GPUs de diferentes capacidades: a placa NVIDIA GeForce 9400GT (16 núcleos) e a placa NVIDIA Quadro 2000 (192 núcleos).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A Otimização por Enxame de Partículas (PSO, Particle Swarm Optimization) é uma técnica de otimização que vem sendo utilizada na solução de diversos problemas, em diferentes áreas do conhecimento. Porém, a maioria das implementações é realizada de modo sequencial. O processo de otimização necessita de um grande número de avaliações da função objetivo, principalmente em problemas complexos que envolvam uma grande quantidade de partículas e dimensões. Consequentemente, o algoritmo pode se tornar ineficiente em termos do desempenho obtido, tempo de resposta e até na qualidade do resultado esperado. Para superar tais dificuldades, pode-se utilizar a computação de alto desempenho e paralelizar o algoritmo, de acordo com as características da arquitetura, visando o aumento de desempenho, a minimização do tempo de resposta e melhoria da qualidade do resultado final. Nesta dissertação, o algoritmo PSO é paralelizado utilizando três estratégias que abordarão diferentes granularidades do problema, assim como dividir o trabalho de otimização entre vários subenxames cooperativos. Um dos algoritmos paralelos desenvolvidos, chamado PPSO, é implementado diretamente em hardware, utilizando uma FPGA. Todas as estratégias propostas, PPSO (Parallel PSO), PDPSO (Parallel Dimension PSO) e CPPSO (Cooperative Parallel PSO), são implementadas visando às arquiteturas paralelas baseadas em multiprocessadores, multicomputadores e GPU. Os diferentes testes realizados mostram que, nos problemas com um maior número de partículas e dimensões e utilizando uma estratégia com granularidade mais fina (PDPSO e CPPSO), a GPU obteve os melhores resultados. Enquanto, utilizando uma estratégia com uma granularidade mais grossa (PPSO), a implementação em multicomputador obteve os melhores resultados.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new three-dimensional Navier-Stokes solver for flows in turbomachines has been developed. The new solver is based on the latest version of the Denton codes, but has been implemented to run on Graphics Processing Units (GPUs) instead of the traditional Central Processing Unit (CPU). The change in processor enables an order-of-magnitude reduction in run-time due to the higher performance of the GPU. Scaling results for a 16 node GPU cluster are also presented, showing almost linear scaling for typical turbomachinery cases. For validation purposes, a test case consisting of a three-stage turbine with complete hub and casing leakage paths is described. Good agreement is obtained with previously published experimental results. The simulation runs in less than 10 minutes on a cluster with four GPUs. Copyright © 2009 by ASME.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a new software framework for the implementation of applications that use stencil computations on block-structured grids to solve partial differential equations. A key feature of the framework is the extensive use of automatic source code generation which is used to achieve high performance on a range of leading multi-core processors. Results are presented for a simple model stencil running on Intel and AMD CPUs as well as the NVIDIA GT200 GPU. The generality of the framework is demonstrated through the implementation of a complete application consisting of many different stencil computations, taken from the field of computational fluid dynamics. © 2010 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

High-throughput DNA sequencing (HTS) instruments today are capable of generating millions of sequencing reads in a short period of time, and this represents a serious challenge to current bioinformatics pipeline in processing such an enormous amount of data in a fast and economical fashion. Modern graphics cards are powerful processing units that consist of hundreds of scalar processors in parallel in order to handle the rendering of high-definition graphics in real-time. It is this computational capability that we propose to harness in order to accelerate some of the time-consuming steps in analyzing data generated by the HTS instruments. We have developed BarraCUDA, a novel sequence mapping software that utilizes the parallelism of NVIDIA CUDA graphics cards to map sequencing reads to a particular location on a reference genome. While delivering a similar mapping fidelity as other mainstream programs , BarraCUDA is a magnitude faster in mapping throughput compared to its CPU counterparts. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the mapping throughput. BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the mapping of millions of sequencing reads generated by HTS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available at http://seqbarracuda.sf.net

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. FINDINGS: Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. CONCLUSIONS: BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology.BarraCUDA is currently available from http://seqbarracuda.sf.net.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a video-based system which interactively captures the geometry of a 3D object in the form of a point cloud, then recognizes and registers known objects in this point cloud in a matter of seconds (fig. 1). In order to achieve interactive speed, we exploit both efficient inference algorithms and parallel computation, often on a GPU. The system can be broken down into two distinct phases: geometry capture, and object inference. We now discuss these in further detail. © 2011 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In Multiplexed MPC, the control variables of a MIMO plant are moved asynchronously, following a pre-planned periodic sequence. The advantage of Multiplexed MPC lies in its reduced computational complexity, leading to faster response to disturbances, which may result in improved performance, despite finding sub-optimal solution to the original problem. This paper extends the original Multiplexed MPC in a way such that the control inputs are no longer restricted to a pre-planned periodic sequence. Instead, the most appropriate control input channel would be optimised and selected to counter the disturbances, hence the name 'Channel-Hopping'. In addition, the proposed algorithm is suitable for execution on modern computing platforms such as FPGA or GPU, exploits multi-core, parallel and pipeline computing techniques. The algorithm for the proposed Channel-hopping MPC (CH-MPC) will be described and its stability established. Illustrative examples are given to demonstrate the behaviour of the proposed Channel-Hopping MPC algorithm. © 2011 IFAC.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The change in acoustic characteristics in personal computers to console gaming and home entertainment systems with the change in the Graphics Processing Unit (GPU), is presented. The tests are carried out using identical configurations of the software and system hardware. The prime components of the hardware used in the project are central processing unit, motherboard, hard disc drive, memory, power supply, optical drive, and additional cooling system. The results from the measurements taken for each GPU tested are analyzed and compared. The test results are obtained using a photo tachometer and reflective tape adhered to one particular fan blade. The test shows that loudness is a psychoacoustic metric developed by Zwicker and Fastal that aims to quantify how loud a sound is perceived as compared to a standard sound. The acoustic experiment reveals that the inherent noise generation mechanism increases with the increase of the complexity of the cooling solution.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a heterogeneous reconfigurable system for real-time applications applying particle filters. The system consists of an FPGA and a multi-threaded CPU. We propose a method to adapt the number of particles dynamically and utilise the run-time reconfigurability of the FPGA for reduced power and energy consumption. An application is developed which involves simultaneous mobile robot localisation and people tracking. It shows that the proposed adaptive particle filter can reduce up to 99% of computation time. Using run-time reconfiguration, we achieve 34% reduction in idle power and save 26-34% of system energy. Our proposed system is up to 7.39 times faster and 3.65 times more energy efficient than the Intel Xeon X5650 CPU with 12 threads, and 1.3 times faster and 2.13 times more energy efficient than an NVIDIA Tesla C2070 GPU. © 2013 Springer-Verlag.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

近些年来,随着计算机硬件技术的高速发展,大规模并行集群系统被越来越多地用于各种科研应用等活动中,而随着多核CPU芯片的技术成熟,多核集群系统对于科学计算的处理能力得到了空前的提高,如何对科学计算中海量数据进行高效地并行计算,评估影响算法性能的相应因素,成为了一个很重要的研究方向。 快速傅立叶变换作为上个世纪公认的最重要的基础算法之一,在包括大规模科学计算处理,数字信号处理,图形图像仿真等众多领域有着广泛的应用,对此,本文结合了2008年中国最快的超级计算机曙光5000A与大规模非规则区域上的快速傅立叶变换算法,深入研究分析了该算法应用在超大规模多核并行环境下的可扩展性测试及影响性能的因素。测试结果表明,该算法在现有的超大规模并行环境下具有较好的性能,在曙光5000A上,算法在8192核的加速比达到了277倍。 本文的另一部分研究工作集中在探索现有HFFT算法在GPGPU上的并行化应用。GPU在处理能力和存储器带宽上相对CPU有明显优势,在成本和功耗上也不需要付出太大代价,这从而为并行数据处理问题提供了新的解决方案。由于图形渲染的高度并行性,使得GPU可以通过增加并行处理单元和存储器控制单元的方式提高处理能力和存储器带宽。 在实际应用中,Nvidia公司的CUDA是用于GPU计算的并行开发环境,是一个全新的软硬件架构,这个架构可以使用GPU来解决商业、工业以及科学方面的复杂计算问题。CUDA是一个完整的GPGPU解决方案,它提供了直接访问硬件的接口。由于目前GPU已在科研领域中得到广泛研究,为了利用GPU的并行数据处理能力,本文探索了一种通过GPU计算提高现有HFFT算法执行速度的途径。之后,本文对CUDA并行算法进行了实际测试,实验结果表明,GPU对并行FFT部分具有20%的加速比,而除去I/O传输后,程序的加速比是34.4倍。

Relevância:

10.00% 10.00%

Publicador:

Resumo:

超大规模地形场景包含大量的几何和纹理数据,无法一次性载入内存,并具有极高的复杂度,因而无法进行实时绘制.提出一种高性能的外存地形场蒂实时漫游技术.该方法使用离散层次细节技术并结合视点相关的动态连续层次细节选择和过渡.算法为地表的简化提出一种新的基于受限法向锥的误差计算方法,使得模型简化具有轮廓保持和光照保持特性.当生成网格包含三角形数目相当时,该方法比传统的基于几何误差的简化更加符合漫游时视觉的感知规律.场景简化过程中提取出的潜在轮廓特征可以通过巧妙地构建漫游时视线方向上的增量地平线来随时更新场蒂不同部分的可见性信息,并以此控制无用数据页面的载入和无效场景部分的绘制,提高绘制速度.漫游系统采用多线程技术.使CPU、GPU、I/O三者的效率得到充分发挥.并可实时生成具有光照和阴影效果的漫游图像.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

引入了一种二元Lattice Boltzmann Model(LBM),实现了两种液体组成的混合流的模拟.不同于其它的类似模型,它区分考虑了流体的粘性和扩散特性,可以很容易地模拟各种互溶或者不互溶的混合流现象.此外,由于LBM的运算大都是线性的局部运算,这使得它很容易在可编程图形处理器(Graphics Process Unit,GPU)上进行加速,从而进行实时模拟.给出了若干二元混合流的模拟结果.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

着色和纹理合成是图形图像中的两类基本研究课题。前者需根据用户定义的彩色笔触信息,自动对黑白照片、电影或者漫画染上颜色;后者则需根据用户输入的样本纹理,经计算得出与样本纹理视觉上近似的结果纹理。这两类课题都有广泛的应用背景。如着色常常用于给经典的黑白电影或者照片自动上色,解决现在的染色工序中存在的需要大量人工交互的难题;而纹理合成常用于电影和电子游戏的地形地貌、织物、头发等等纹理的自动生成。 这两大类问题都需要分析纹理特征,并且依赖于分析结果的准确性。Gabor小波滤波器与人眼的视觉感受野相当吻合,用它来分析纹理得到的结果比较精确。鉴于此,本文把Gabor小波应用到了着色问题和纹理合成中。对于着色问题,本文用基于Gabor小波的特征向量重新定义邻居关系,然后用最优化方法迭代地对照片和卡通染色。相比以往的算法,本算法具有用户交互少、效果好、算法简单稳健的优点,并且算法允许用户逐步地添加色彩细节。对于纹理合成,本文用基于Gabor小波的特征向量来预计算K-Coherence候选集,提高了K-Coherence算法的准确性,从而改进了纹理合成的最终效果。 本文提出的算法是天然并行的,因而可利用GPU加速,做到实时计算。