Biblioteca Digital

13 resultados para virtualised GPU

em Cambridge University Engineering Department Publications Database

Computer generated hologram with geometric occlusion using GPU-accelerated depth buffer rasterization for three-dimensional display

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a method of rapidly producing computer-generated holograms that exhibit geometric occlusion in the reconstructed image. Conceptually, a bundle of rays is shot from every hologram sample into the object volume.We use z buffering to find the nearest intersecting object point for every ray and add its complex field contribution to the corresponding hologram sample. Each hologram sample belongs to an independent operation, allowing us to exploit the parallel computing capability of modern programmable graphics processing units (GPUs). Unlike algorithms that use points or planar segments as the basis for constructing the hologram, our algorithm's complexity is dependent on fixed system parameters, such as the number of ray-casting operations, and can therefore handle complicated models more efficiently. The finite number of hologram pixels is, in effect, a windowing function, and from analyzing the Wigner distribution function of windowed free-space transfer function we find an upper limit on the cone angle of the ray bundle. Experimentally, we found that an angular sampling distance of 0:01' for a 2:66' cone angle produces acceptable reconstruction quality. © 2009 Optical Society of America.

Veja mais

Acceleration of an unstructured hybrid mesh RANS solver by porting to GPU architectures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The modern CFD process consists of mesh generation, flow solving and post-processing integrated into an automated workflow. During the last several years we have developed and published research aimed at producing a meshing and geometry editing system, implemented in an end-to-end parallel, scalable manner and capable of automatic handling of large scale, real world applications. The particular focus of this paper is the associated unstructured mesh RANS flow solver and the porting of it to GPU architectures. After briefly describing the solver itself, the special issues associated with porting codes using unstructured data structures are discussed - followed by some application examples. Copyright © 2011 by W.N. Dawes.

Veja mais

Large-Scale Gas Turbine Simulations on GPU Clusters

Relevância:

20.00% 20.00%

Publicador:

Veja mais

GPU-accelerated optimization of fuel treatments for mitigating wildfire hazard

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fuel treatment is considered a suitable way to mitigate the hazard related to potential wildfires on a landscape. However, designing an optimal spatial layout of treatment units represents a difficult optimization problem. In fact, budget constraints, the probabilistic nature of fire spread and interactions among the different area units composing the whole treatment, give rise to challenging search spaces on typical landscapes. In this paper we formulate such optimization problem with the objective of minimizing the extension of land characterized by high fire hazard. Then, we propose a computational approach that leads to a spatially-optimized treatment layout exploiting Tabu Search and General-Purpose computing on Graphics Processing Units (GPGPU). Using an application example, we also show that the proposed methodology can provide high-quality design solutions in low computing time. © 2013 The Authors. Published by Elsevier B.V.

Veja mais

SBLOCK: a framework for efficient stencil-based PDE solvers on multi-core platforms

Relevância:

10.00% 10.00%

Publicador:

Veja mais

An accelerated 3D Navier-Stokes solver for flows in turbomachines

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new three-dimensional Navier-Stokes solver for flows in turbomachines has been developed. The new solver is based on the latest version of the Denton codes, but has been implemented to run on Graphics Processing Units (GPUs) instead of the traditional Central Processing Unit (CPU). The change in processor enables an order-of-magnitude reduction in run-time due to the higher performance of the GPU. Scaling results for a 16 node GPU cluster are also presented, showing almost linear scaling for typical turbomachinery cases. For validation purposes, a test case consisting of a three-stage turbine with complete hub and casing leakage paths is described. Good agreement is obtained with previously published experimental results. The simulation runs in less than 10 minutes on a cluster with four GPUs. Copyright © 2009 by ASME.

Veja mais

SBLOCK: A framework for efficient stencil-based PDE solvers on multi-core platforms

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a new software framework for the implementation of applications that use stencil computations on block-structured grids to solve partial differential equations. A key feature of the framework is the extensive use of automatic source code generation which is used to achieve high performance on a range of leading multi-core processors. Results are presented for a simple model stencil running on Intel and AMD CPUs as well as the NVIDIA GT200 GPU. The generality of the framework is demonstrated through the implementation of a complete application consisting of many different stencil computations, taken from the field of computational fluid dynamics. © 2010 IEEE.

Veja mais

BarraCUDA – a Fast Sequence Mapping Software using Graphics Processing Units

Relevância:

10.00% 10.00%

Publicador:

Resumo:

High-throughput DNA sequencing (HTS) instruments today are capable of generating millions of sequencing reads in a short period of time, and this represents a serious challenge to current bioinformatics pipeline in processing such an enormous amount of data in a fast and economical fashion. Modern graphics cards are powerful processing units that consist of hundreds of scalar processors in parallel in order to handle the rendering of high-definition graphics in real-time. It is this computational capability that we propose to harness in order to accelerate some of the time-consuming steps in analyzing data generated by the HTS instruments. We have developed BarraCUDA, a novel sequence mapping software that utilizes the parallelism of NVIDIA CUDA graphics cards to map sequencing reads to a particular location on a reference genome. While delivering a similar mapping fidelity as other mainstream programs , BarraCUDA is a magnitude faster in mapping throughput compared to its CPU counterparts. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the mapping throughput. BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the mapping of millions of sequencing reads generated by HTS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available at http://seqbarracuda.sf.net

Veja mais

BarraCUDA - a fast short read sequence aligner using graphics processing units.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. FINDINGS: Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. CONCLUSIONS: BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology.BarraCUDA is currently available from http://seqbarracuda.sf.net.

Veja mais

Live 3D shape reconstruction, recognition and registration

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a video-based system which interactively captures the geometry of a 3D object in the form of a point cloud, then recognizes and registers known objects in this point cloud in a matter of seconds (fig. 1). In order to achieve interactive speed, we exploit both efficient inference algorithms and parallel computation, often on a GPU. The system can be broken down into two distinct phases: geometry capture, and object inference. We now discuss these in further detail. © 2011 IEEE.

Veja mais

Channel-hopping model predictive control

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In Multiplexed MPC, the control variables of a MIMO plant are moved asynchronously, following a pre-planned periodic sequence. The advantage of Multiplexed MPC lies in its reduced computational complexity, leading to faster response to disturbances, which may result in improved performance, despite finding sub-optimal solution to the original problem. This paper extends the original Multiplexed MPC in a way such that the control inputs are no longer restricted to a pre-planned periodic sequence. Instead, the most appropriate control input channel would be optimised and selected to counter the disturbances, hence the name 'Channel-Hopping'. In addition, the proposed algorithm is suitable for execution on modern computing platforms such as FPGA or GPU, exploits multi-core, parallel and pipeline computing techniques. The algorithm for the proposed Channel-hopping MPC (CH-MPC) will be described and its stability established. Illustrative examples are given to demonstrate the behaviour of the proposed Channel-Hopping MPC algorithm. © 2011 IFAC.

Veja mais

Graphics processing unit cooling solutions: Acoustic characteristics

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The change in acoustic characteristics in personal computers to console gaming and home entertainment systems with the change in the Graphics Processing Unit (GPU), is presented. The tests are carried out using identical configurations of the software and system hardware. The prime components of the hardware used in the project are central processing unit, motherboard, hard disc drive, memory, power supply, optical drive, and additional cooling system. The results from the measurements taken for each GPU tested are analyzed and compared. The test results are obtained using a photo tachometer and reflective tape adhered to one particular fan blade. The test shows that loudness is a psychoacoustic metric developed by Zwicker and Fastal that aims to quantify how loud a sound is perceived as compared to a standard sound. The acoustic experiment reveals that the inherent noise generation mechanism increases with the increase of the complexity of the cooling solution.

Veja mais

Heterogeneous reconfigurable system for adaptive particle filters in real-time applications

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a heterogeneous reconfigurable system for real-time applications applying particle filters. The system consists of an FPGA and a multi-threaded CPU. We propose a method to adapt the number of particles dynamically and utilise the run-time reconfigurability of the FPGA for reduced power and energy consumption. An application is developed which involves simultaneous mobile robot localisation and people tracking. It shows that the proposed adaptive particle filter can reduce up to 99% of computation time. Using run-time reconfiguration, we achieve 34% reduction in idle power and save 26-34% of system energy. Our proposed system is up to 7.39 times faster and 3.65 times more energy efficient than the Intel Xeon X5650 CPU with 12 threads, and 1.3 times faster and 2.13 times more energy efficient than an NVIDIA Tesla C2070 GPU. © 2013 Springer-Verlag.

Veja mais

13 resultados para virtualised GPU

em Cambridge University Engineering Department Publications Database

Filtro por publicador