976 resultados para GPU acceleration


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The nonlinear problem of steady free-surface flow past a submerged source is considered as a case study for three-dimensional ship wave problems. Of particular interest is the distinctive wedge-shaped wave pattern that forms on the surface of the fluid. By reformulating the governing equations with a standard boundary-integral method, we derive a system of nonlinear algebraic equations that enforce a singular integro-differential equation at each midpoint on a two-dimensional mesh. Our contribution is to solve the system of equations with a Jacobian-free Newton-Krylov method together with a banded preconditioner that is carefully constructed with entries taken from the Jacobian of the linearised problem. Further, we are able to utilise graphics processing unit acceleration to significantly increase the grid refinement and decrease the run-time of our solutions in comparison to schemes that are presently employed in the literature. Our approach provides opportunities to explore the nonlinear features of three-dimensional ship wave patterns, such as the shape of steep waves close to their limiting configuration, in a manner that has been possible in the two-dimensional analogue for some time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Differential equations are often directly solvable by analytical means only in their one dimensional version. Partial differential equations are generally not solvable by analytical means in two and three dimensions, with the exception of few special cases. In all other cases, numerical approximation methods need to be utilized. One of the most popular methods is the finite element method. The main areas of focus, here, are the Poisson heat equation and the plate bending equation. The purpose of this paper is to provide a quick walkthrough of the various approaches that the authors followed in pursuit of creating optimal solvers, accelerated with the use of graphical processing units, and comparing them in terms of accuracy and time efficiency with existing or self-made non-accelerated solvers.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

How can GPU acceleration be obtained as a service in a cluster? This question has become increasingly significant due to the inefficiency of installing GPUs on all nodes of a cluster. The research reported in this paper is motivated to address the above question by employing rCUDA (remote CUDA), a framework that facilitates Acceleration-as-a-Service (AaaS), such that the nodes of a cluster can request the acceleration of a set of remote GPUs on demand. The rCUDA framework exploits virtualisation and ensures that multiple nodes can share the same GPU. In this paper we test the feasibility of the rCUDA framework on a real-world application employed in the financial risk industry that can benefit from AaaS in the production setting. The results confirm the feasibility of rCUDA and highlight that rCUDA achieves similar performance compared to CUDA, provides consistent results, and more importantly, allows for a single application to benefit from all the GPUs available in the cluster without loosing efficiency.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The modern CFD process consists of mesh generation, flow solving and post-processing integrated into an automated workflow. During the last several years we have developed and published research aimed at producing a meshing and geometry editing system, implemented in an end-to-end parallel, scalable manner and capable of automatic handling of large scale, real world applications. The particular focus of this paper is the associated unstructured mesh RANS flow solver and the porting of it to GPU architectures. After briefly describing the solver itself, the special issues associated with porting codes using unstructured data structures are discussed - followed by some application examples. Copyright © 2011 by W.N. Dawes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation, through time-consuming Monte Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as the Low-Density Parity-Check (LDPC) codes adopted by modern communication standards, which involves thousands of Monte Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context, we evaluate the concept of retargeting a single OpenCL program to multiple platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs, and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g., 8,000 bit) to long length (e.g., 64,800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, thus providing different acceleration factors over conventional multicore CPUs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

SAFT techniques are based on the sequential activation, in emission and reception, of the array elements and the post-processing of all the received signals to compose the image. Thus, the image generation can be divided into two stages: (1) the excitation and acquisition stage, where the signals received by each element or group of elements are stored; and (2) the beamforming stage, where the signals are combined together to obtain the image pixels. The use of Graphics Processing Units (GPUs), which are programmable devices with a high level of parallelism, can accelerate the computations of the beamforming process, that usually includes different functions such as dynamic focusing, band-pass filtering, spatial filtering or envelope detection. This work shows that using GPU technology can accelerate, in more than one order of magnitude with respect to CPU implementations, the beamforming and post-processing algorithms in SAFT imaging. ©2009 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Huge image collections are becoming available lately. In this scenario, the use of Content-Based Image Retrieval (CBIR) systems has emerged as a promising approach to support image searches. The objective of CBIR systems is to retrieve the most similar images in a collection, given a query image, by taking into account image visual properties such as texture, color, and shape. In these systems, the effectiveness of the retrieval process depends heavily on the accuracy of ranking approaches. Recently, re-ranking approaches have been proposed to improve the effectiveness of CBIR systems by taking into account the relationships among images. The re-ranking approaches consider the relationships among all images in a given dataset. These approaches typically demands a huge amount of computational power, which hampers its use in practical situations. On the other hand, these methods can be massively parallelized. In this paper, we propose to speedup the computation of the RL-Sim algorithm, a recently proposed image re-ranking approach, by using the computational power of Graphics Processing Units (GPU). GPUs are emerging as relatively inexpensive parallel processors that are becoming available on a wide range of computer systems. We address the image re-ranking performance challenges by proposing a parallel solution designed to fit the computational model of GPUs. We conducted an experimental evaluation considering different implementations and devices. Experimental results demonstrate that significant performance gains can be obtained. Our approach achieves speedups of 7x from serial implementation considering the overall algorithm and up to 36x on its core steps.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Interactive ray tracing of non-trivial scenes is just becoming feasible on single graphics processing units (GPU). Recent work in this area focuses on building effective acceleration structures, which work well under the constraints of current GPUs. Most approaches are targeted at static scenes and only allow navigation in the virtual scene. So far support for dynamic scenes has not been considered for GPU implementations. We have developed a GPU-based ray tracing system for dynamic scenes consisting of a set of individual objects. Each object may independently move around, but its geometry and topology are static.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, we propose the use of the neural gas (NG), a neural network that uses an unsupervised Competitive Hebbian Learning (CHL) rule, to develop a reverse engineering process. This is a simple and accurate method to reconstruct objects from point clouds obtained from multiple overlapping views using low-cost sensors. In contrast to other methods that may need several stages that include downsampling, noise filtering and many other tasks, the NG automatically obtains the 3D model of the scanned objects. To demonstrate the validity of our proposal we tested our method with several models and performed a study of the neural network parameterization computing the quality of representation and also comparing results with other neural methods like growing neural gas and Kohonen maps or classical methods like Voxel Grid. We also reconstructed models acquired by low cost sensors that can be used in virtual and augmented reality environments for redesign or manipulation purposes. Since the NG algorithm has a strong computational cost we propose its acceleration. We have redesigned and implemented the NG learning algorithm to fit it onto Graphics Processing Units using CUDA. A speed-up of 180× faster is obtained compared to the sequential CPU version.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Motor vehicle emission factors are generally derived from driving tests mimicking steady state conditions or transient drive cycles. However, neither of these test conditions completely represents real world driving conditions. In particular, they fail to determine emissions generated during the accelerating phase – a condition in which urban buses spend much of their time. In this study we analyse and compare the results of time-dependant emission measurements conducted on diesel and compressed natural gas (CNG) buses during an urban driving cycle on a chassis dynamometer and we derive power-law expressions relating carbon dioxide (CO2) emission factors to the instantaneous speed while accelerating from rest. Emissions during acceleration are compared with that during steady speed operation. These results have important implications for emission modelling particularly under congested traffic conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of graphical processing unit (GPU) parallel processing is becoming a part of mainstream statistical practice. The reliance of Bayesian statistics on Markov Chain Monte Carlo (MCMC) methods makes the applicability of parallel processing not immediately obvious. It is illustrated that there are substantial gains in improved computational time for MCMC and other methods of evaluation by computing the likelihood using GPU parallel processing. Examples use data from the Global Terrorism Database to model terrorist activity in Colombia from 2000 through 2010 and a likelihood based on the explicit convolution of two negative-binomial processes. Results show decreases in computational time by a factor of over 200. Factors influencing these improvements and guidelines for programming parallel implementations of the likelihood are discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The authors have collaboratively used a graphical language to describe their shared knowledge of a small domain of mathematics, which has in turn scaffolded their re-development of a related curriculum for mathematics acceleration. This collaborative use of the graphical language is reported as a simple descriptive case study. This leads to an evaluation of the graphical language’s usefulness as a tool to support the articulation of the structure of mathematics knowledge. In turn, implications are drawn for how the graphical language may be utilised as the detail of the curriculum is further elaborated and communicated to teachers.