29 resultados para OpenCL


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Este trabalho apresenta a proposta de um middleware, chamado DistributedCL, que torna transparente o processamento paralelo em GPUs distribuídas. Com o suporte do middleware DistributedCL uma aplicação, preparada para utilizar a API OpenCL, pode executar de forma distribuída, utilizando GPUs remotas, de forma transparente e sem necessidade de alteração ou nova compilação do seu código. A arquitetura proposta para o middleware DistributedCL é modular, com camadas bem definidas e um protótipo foi construído de acordo com a arquitetura, onde foram empregados vários pontos de otimização, incluindo o envio de dados em lotes, comunicação assíncrona via rede e chamada assíncrona da API OpenCL. O protótipo do middleware DistributedCL foi avaliado com o uso de benchmarks disponíveis e também foi desenvolvido o benchmark CLBench, para avaliação de acordo com a quantidade dos dados. O desempenho do protótipo se mostrou bom, superior às propostas semelhantes, tendo alguns resultados próximos do ideal, sendo o tamanho dos dados para transmissão através da rede o maior fator limitante.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hardware designers and engineers typically need to explore a multi-parametric design space in order to find the best configuration for their designs using simulations that can take weeks to months to complete. For example, designers of special purpose chips need to explore parameters such as the optimal bitwidth and data representation. This is the case for the development of complex algorithms such as Low-Density Parity-Check (LDPC) decoders used in modern communication systems. Currently, high-performance computing offers a wide set of acceleration options, that range from multicore CPUs to graphics processing units (GPUs) and FPGAs. Depending on the simulation requirements, the ideal architecture to use can vary. In this paper we propose a new design flow based on OpenCL, a unified multiplatform programming model, which accelerates LDPC decoding simulations, thereby significantly reducing architectural exploration and design time. OpenCL-based parallel kernels are used without modifications or code tuning on multicore CPUs, GPUs and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL for mapping the simulations into FPGAs. To the best of our knowledge, this is the first time that a single, unmodified OpenCL code is used to target those three different platforms. We show that, depending on the design parameters to be explored in the simulation, on the dimension and phase of the design, the GPU or the FPGA may suit different purposes more conveniently, providing different acceleration factors. For example, although simulations can typically execute more than 3x faster on FPGAs than on GPUs, the overhead of circuit synthesis often outweighs the benefits of FPGA-accelerated execution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

clRNG et clProbdist sont deux interfaces de programmation (APIs) que nous avons développées pour la génération de nombres aléatoires uniformes et non uniformes sur des dispositifs de calculs parallèles en utilisant l’environnement OpenCL. La première interface permet de créer au niveau d’un ordinateur central (hôte) des objets de type stream considérés comme des générateurs virtuels parallèles qui peuvent être utilisés aussi bien sur l’hôte que sur les dispositifs parallèles (unités de traitement graphique, CPU multinoyaux, etc.) pour la génération de séquences de nombres aléatoires. La seconde interface permet aussi de générer au niveau de ces unités des variables aléatoires selon différentes lois de probabilité continues et discrètes. Dans ce mémoire, nous allons rappeler des notions de base sur les générateurs de nombres aléatoires, décrire les systèmes hétérogènes ainsi que les techniques de génération parallèle de nombres aléatoires. Nous présenterons aussi les différents modèles composant l’architecture de l’environnement OpenCL et détaillerons les structures des APIs développées. Nous distinguons pour clRNG les fonctions qui permettent la création des streams, les fonctions qui génèrent les variables aléatoires uniformes ainsi que celles qui manipulent les états des streams. clProbDist contient les fonctions de génération de variables aléatoires non uniformes selon la technique d’inversion ainsi que les fonctions qui permettent de retourner différentes statistiques des lois de distribution implémentées. Nous évaluerons ces interfaces de programmation avec deux simulations qui implémentent un exemple simplifié d’un modèle d’inventaire et un exemple d’une option financière. Enfin, nous fournirons les résultats d’expérimentation sur les performances des générateurs implémentés.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Brief overview for the motivation and architecture of OpenCL

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

En el presente trabajo se propone dar solución a uno de los problemas principales surgido en el campo del análisis de imágenes hiperespectrales. En las últimas décadas este campo está siendo muy activo, por lo que es de vital importancia tratar su problema principal: mezcla espectral. Muchos algoritmos han tratado de solucionar este problema, pero que a través de este trabajo se propone una cadena nueva de desmezclado en paralelo, para ser acelerados bajo el paradigma de programación paralela de OpenCl. Este paradigma nos aporta el modelo de programación unificada para acelerar algoritmos en sistemas heterogéneos. Podemos dividir el proceso de desmezclado espectral en tres etapas. La primera tiene la tarea de encontrar el número de píxeles puros, llamaremos endmembers a los píxeles formados por una única firma espectral, utilizaremos el algoritmo conocido como Geometry-based Estimation of number of endmembers, GENE. La segunda etapa se encarga de identificar los píxel endmembers y extraerlos junto con todas sus bandas espectrales, para esta etapa se utilizará el algoritmo conocido por Simplex Growing Algorithm, SGA. En la última etapa se crean los mapas de abundancia para cada uno de los endmembers encontrados, de esta etapa será encargado el algoritmo conocido por, Sum-to-one Constrained Linear Spectral Unmixing, SCLSU. Las plataformas utilizadas en este proyecto han sido tres: CPU, Intel Xeon E5-2695 v3, GPU, NVidia GeForce GTX 980, Acelerador, Intel Xeon Phi 31S1P. La idea de este proyecto se basa en realizar un análisis exhaustivo de los resultados obtenidos en las diferentes plataformas, con el fin de evaluar cuál se ajusta mejor a nuestras necesidades.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

El flujo óptico y la estimación de movimiento es área de conocimiento muy importante usado en otros campos del conocimiento como el de la seguridad o el de la bioinformática. En estos sectores, se demandan aplicaciones de flujo óptico que realicen actividades muy importantes con tiempos de ejecución lo más bajos posibles, llegando a tiempo real si es posible. Debido a la gran complejidad de cálculos que siguen a este tipo de algoritmos como se observará en la sección de resultados, la aceleración de estos es una parte vital para dar soporte y conseguir ese tiempo real tan buscado. Por lo que planteamos como objetivo para este TFG la aceleración de este tipo de algoritmos mediante diversos tipos de aceleradores usando OpenCL y de paso demostrar que OpenCL es una buena herramienta que permite códigos paralelizados con un gran Speedup a la par que funcionar en toda una diversa gama de dispositivos tan distintos como un GPU y una FPGA. Para lo anteriormente mencionado trataremos de desarrollar un código para cada algoritmo y optimizarlo de forma no especifica a una plataforma para posteriormente ejecutarlo sobre las diversas plataformas y medir tiempos y error para cada algoritmo. Para el desarrollo de este proyecto partimos de la teoría de dos algoritmos ya existentes: Lucas&Kanade monoescala y el Horn&Schunck. Además, usaremos estímulos para estos algoritmos muy aceptados por la comunidad como pueden ser el RubberWhale o los Grove, los cuales nos ayudarán a establecer la corrección de estos algoritmos y analizar su precisión, dando así un estudio referencia para saber cual escoger.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we develop a fast implementation of an hyperspectral coded aperture (HYCA) algorithm on different platforms using OpenCL, an open standard for parallel programing on heterogeneous systems, which includes a wide variety of devices, from dense multicore systems from major manufactures such as Intel or ARM to new accelerators such as graphics processing units (GPUs), field programmable gate arrays (FPGAs), the Intel Xeon Phi and other custom devices. Our proposed implementation of HYCA significantly reduces its computational cost. Our experiments have been conducted using simulated data and reveal considerable acceleration factors. This kind of implementations with the same descriptive language on different architectures are very important in order to really calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Safety concerns in the operation of autonomous aerial systems require safe-landing protocols be followed during situations where the a mission should be aborted due to mechanical or other failure. On-board cameras provide information that can be used in the determination of potential landing sites, which are continually updated and ranked to prevent injury and minimize damage. Pulse Coupled Neural Networks have been used for the detection of features in images that assist in the classification of vegetation and can be used to minimize damage to the aerial vehicle. However, a significant drawback in the use of PCNNs is that they are computationally expensive and have been more suited to off-line applications on conventional computing architectures. As heterogeneous computing architectures are becoming more common, an OpenCL implementation of a PCNN feature generator is presented and its performance is compared across OpenCL kernels designed for CPU, GPU and FPGA platforms. This comparison examines the compute times required for network convergence under a variety of images obtained during unmanned aerial vehicle trials to determine the plausibility for real-time feature detection.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Reconfigurable computing devices can increase the performance of compute intensive algorithms by implementing application specific co-processor architectures. The power cost for this performance gain is often an order of magnitude less than that of modern CPUs and GPUs. Exploiting the potential of reconfigurable devices such as Field-Programmable Gate Arrays (FPGAs) is typically a complex and tedious hardware engineering task. Re- cently the major FPGA vendors (Altera, and Xilinx) have released their own high-level design tools, which have great potential for rapid development of FPGA based custom accelerators. In this paper, we will evaluate Altera’s OpenCL Software Development Kit, and Xilinx’s Vivado High Level Sythesis tool. These tools will be compared for their per- formance, logic utilisation, and ease of development for the test case of a Tri-diagonal linear system solver.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Safety concerns in the operation of autonomous aerial systems require safe-landing protocols be followed during situations where the mission should be aborted due to mechanical or other failure. This article presents a pulse-coupled neural network (PCNN) to assist in the vegetation classification in a vision-based landing site detection system for an unmanned aircraft. We propose a heterogeneous computing architecture and an OpenCL implementation of a PCNN feature generator. Its performance is compared across OpenCL kernels designed for CPU, GPU, and FPGA platforms. This comparison examines the compute times required for network convergence under a variety of images to determine the plausibility for real-time feature detection.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tridiagonal diagonally dominant linear systems arise in many scientific and engineering applications. The standard Thomas algorithm for solving such systems is inherently serial forming a bottleneck in computation. Algorithms such as cyclic reduction and SPIKE reduce a single large tridiagonal system into multiple small independent systems which can be solved in parallel. We have developed portable cyclic reduction and SPIKE algorithm OpenCL implementations with the intent to target a range of co-processors in a heterogeneous computing environment including Field Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs) and other multi-core processors. In this paper, we evaluate these designs in the context of solver performance, resource efficiency and numerical accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Stochastic volatility models are of fundamental importance to the pricing of derivatives. One of the most commonly used models of stochastic volatility is the Heston Model in which the price and volatility of an asset evolve as a pair of coupled stochastic differential equations. The computation of asset prices and volatilities involves the simulation of many sample trajectories with conditioning. The problem is treated using the method of particle filtering. While the simulation of a shower of particles is computationally expensive, each particle behaves independently making such simulations ideal for massively parallel heterogeneous computing platforms. In this paper, we present our portable Opencl implementation of the Heston model and discuss its performance and efficiency characteristics on a range of architectures including Intel cpus, Nvidia gpus, and Intel Many-Integrated-Core (mic) accelerators.