6 resultados para GPU computing

em Repositorio Institucional de la Universidad de Málaga


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection mechanism. An efficient implementation of these features is key in order to provide an effective synchronization mechanism at the local memory level. After a quick description of the main features of our HTM design for GPU local memory, in this work we gather together a number of proposals designed with the aim of improving those mechanisms with high impact on performance. Firstly, the SIMT execution model is modified to increase the parallelism of the application when transactions must be serialized in order to make forward progress. Secondly, the conflict detection mechanism is optimized depending on application characteristics, such us the read/write sets, the probability of conflict between transactions and the existence of read-only transactions. As these features can be present in hardware simultaneously, it is a task of the compiler and runtime to determine which ones are more important for a given application. This work includes a discussion on the analysis to be done in order to choose the best configuration solution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract: As time has passed, the general purpose programming paradigm has evolved, producing different hardware architectures whose characteristics differ widely. In this work, we are going to demonstrate, through different applications belonging to the field of Image Processing, the existing difference between three Nvidia hardware platforms: two of them belong to the GeForce graphics cards series, the GTX 480 and the GTX 980 and one of the low consumption platforms which purpose is to allow the execution of embedded applications as well as providing an extreme efficiency: the Jetson TK1. With respect to the test applications we will use five examples from Nvidia CUDA Samples. These applications are directly related to Image Processing, as the algorithms they use are similar to those from the field of medical image registration. After the tests, it will be proven that GTX 980 is both the device with the highest computational power and the one that has greater consumption, it will be seen that Jetson TK1 is the most efficient platform, it will be shown that GTX 480 produces more heat than the others and we will learn other effects produced by the existing difference between the architecture of the devices.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

En este documento se expondrá una implementación del problema del viajante de comercio usando una implementación personalizada de un mapa auto-organizado basándose en soluciones anteriores y adaptándolas a la arquitectura CUDA, haciendo a la vez una comparativa de la implementación eficiente en CUDA C/C++ con la implementación de las funciones de GPU incluidas en el Parallel Computing Toolbox de Matlab. La solución que se da reduce en casi un cuarto las iteraciones necesarias para llegar a una solución buena del problema mencionado, además de la mejora inminente del uso de las arquitecturas paralelas. En esta solución se estudia la mejora en tiempo que se consigue con el uso específico de la memoria compartida, siendo esta una de las herramientas más potentes para mejorar el rendimiento. En lo referente a los tiempos de ejecución, se llega a concluir que la mejor solución es el lanzamiento de un kernel de CUDA desde Matlab a través de la funcionalidad incluida en el Parallel Computing Toolbox.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This talk, which is based on our newest findings and experiences from research and industrial projects, addresses one of the most relevant challenges for a decade to come: How to integrate the Internet of Things with software, people, and processes, considering modern Cloud Computing and Elasticity principles. Elasticity is seen as one of the main characteristics of Cloud Computing today. Is elasticity simply scalability on steroids? This talk addresses the main principles of elasticity, presents a fresh look at this problem, and examines how to integrate people, software services, and things into one composite system, which can be modeled, programmed, and deployed on a large scale in an elastic way. This novel paradigm has major consequences on how we view, build, design, and deploy ultra-large scale distributed systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There are diferent applications in Engineering that require to compute improper integrals of the first kind (integrals defined on an unbounded domain) such as: the work required to move an object from the surface of the earth to in nity (Kynetic Energy), the electric potential created by a charged sphere, the probability density function or the cumulative distribution function in Probability Theory, the values of the Gamma Functions(wich useful to compute the Beta Function used to compute trigonometrical integrals), Laplace and Fourier Transforms (very useful, for example in Differential Equations).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

After a decade evolving in the High Performance Computing arena, GPU-equipped supercomputers have con- quered the top500 and green500 lists, providing us unprecedented levels of computational power and memory bandwidth. This year, major vendors have introduced new accelerators based on 3D memory, like Xeon Phi Knights Landing by Intel and Pascal architecture by Nvidia. This paper reviews hardware features of those new HPC accelerators and unveils potential performance for scientific applications, with an emphasis on Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM) used by commercial products according to roadmaps already announced.