Biblioteca Digital

246 resultados para virtualised GPU

Using GPU to exploit parallelism on cryptography

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this article we explore the NVIDIA graphical processing units (GPU) computational power in cryptography using CUDA (Compute Unified Device Architecture) technology. CUDA makes the general purpose computing easy using the parallel processing presents in GPUs. To do this, the NVIDIA GPUs architectures and CUDA are presented, besides cryptography concepts. Furthermore, we do the comparison between the versions executed in CPU with the parallel version of the cryptography algorithms Advanced Encryption Standard (AES) and Message-digest Algorithm 5 (MD5) wrote in CUDA. © 2011 AISTI.

Implementação do algoritmo de treinamento do classificador Floresta de Caminhos Ótimos em GPU

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Computação - IBILCE

Implementação do algoritmo de treinamento do classificador floresta de caminhos ótimos em GPU

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Técnicas de reconhecimento de padrões tem como principal objetivo classificar um conjunto de amostras, sendo o processo de aprendizado a fase de maior consumo de tempo. O problema pode piorar em ferramentas de classificação interativas, o que pode ser inaceitável para grandes bases de dados. Um exemplo de classificador é o baseado em Floresta de Caminhos Ótimos [8] - OPF. Dado que muitos trabalhos tem sido orientados à implementação de algoritmos de reconhecimento de padrões em ambiente General Purpose Graphics Processing Unit - GPGPU, o presente estudo objetivou a implementação da etapa de treinamento do classificador Floresta de Caminhos Ótimos em CUDA, visando aumentar a sua eficiência. A otimização do classificador em CUDA demonstrou uma fase de treinamento mais rápida que a versão original.

Hyperspectral image compression onboard next-generation satellites: implementation solutions on GPU and FPGAs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Programa de doctorado: Ingeniería de Telecomunicación Avanzada

Analisi di immagini con trasformata Ranklet: ottimizzazioni computazionali su CPU e GPU

Relevância:

20.00% 20.00%

Publicador:

Gpu-based many-core architecture emulation: A double level approach

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The efficient emulation of a many-core architecture is a challenging task, each core could be emulated through a dedicated thread and such threads would be interleaved on an either single-core or a multi-core processor. The high number of context switches will results in an unacceptable performance. To support this kind of application, the GPU computational power is exploited in order to schedule the emulation threads on the GPU cores. This presents a non trivial divergence issue, since GPU computational power is offered through SIMD processing elements, that are forced to synchronously execute the same instruction on different memory portions. Thus, a new emulation technique is introduced in order to overcome this limitation: instead of providing a routine for each ISA opcode, the emulator mimics the behavior of the Micro Architecture level, here instructions are date that a unique routine takes as input. Our new technique has been implemented and compared with the classic emulation approach, in order to investigate the chance of a hybrid solution.

Analysis of cone-beam ct dose in image-guided radiation therapy for brain and prostate cancer patients via gpu-based Monte Carlo simulations

Relevância:

20.00% 20.00%

Publicador:

Resumo:

La radioterapia guidata da immagini (IGRT), grazie alle ripetute verifiche della posizione del paziente e della localizzazione del volume bersaglio, si è recentemente affermata come nuovo paradigma nella radioterapia, avendo migliorato radicalmente l’accuratezza nella somministrazione di dose a scopo terapeutico. Una promettente tecnica nel campo dell’IGRT è rappresentata dalla tomografia computerizzata a fascio conico (CBCT). La CBCT a kilovoltaggio, consente di fornire un’accurata mappatura tridimensionale dell’anatomia del paziente, in fase di pianificazione del trattamento e a ogni frazione del medisimo. Tuttavia, la dose da imaging attribuibile alle ripetute scansioni è diventata, negli ultimi anni, oggetto di una crescente preoccupazione nel contesto clinico. Lo scopo di questo lavoro è di valutare quantitativamente la dose addizionale somministrata da CBCT a kilovoltaggio, con riferimento a tre tipici protocolli di scansione per Varian OnBoard Imaging Systems (OBI, Palo Alto, California). A questo scopo sono state condotte simulazioni con codici Monte Carlo per il calcolo della dose, utilizzando il pacchetto gCTD, sviluppato sull’architettura della scheda grafica. L’utilizzo della GPU per sistemi server di calcolo ha permesso di raggiungere alte efficienze computazionali, accelerando le simulazioni Monte Carlo fino a raggiungere tempi di calcolo di ~1 min per un caso tipico. Inizialmente sono state condotte misure sperimentali di dose su un fantoccio d’acqua. I parametri necessari per la modellazione della sorgente di raggi X nel codice gCTD sono stati ottenuti attraverso un processo di validazione del codice al fine di accordare i valori di dose simulati in acqua con le misure nel fantoccio. Lo studio si concentra su cinquanta pazienti sottoposti a cicli di radioterapia a intensità modulata (IMRT). Venticinque pazienti con tumore al cervello sono utilizzati per studiare la dose nel protocollo standard-dose head e venticinque pazienti con tumore alla prostata sono selezionati per studiare la dose nei protocolli pelvis e pelvis spotlight. La dose media a ogni organo è calcolata. La dose media al 2% dei voxels con i valori più alti di dose è inoltre computata per ogni organo, al fine di caratterizzare l’omogeneità spaziale della distribuzione.

Progetto e implementazione di un decodificatore ldpc su architettura gpu

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In accordo con la filosofia della Software Defined Radio è stato progettato un decoder LDPC software che utilizza una GPU per ottenere prestazioni migliori. Il lavoro, che comprende anche l'encoder e un simulatore di canale AWGN, può essere utilizzato sia per eseguire simulazioni che per elaborare dati in real time. Come caso di studio si sono considerati i codici LDPC dello standard DVB-S2.

Hybrid, ray tracing - ray tracing using gpu-accelerated image-spacemethods

Relevância:

20.00% 20.00%

Publicador:

GPU-based Ray Tracing of Dynamic Scenes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Interactive ray tracing of non-trivial scenes is just becoming feasible on single graphics processing units (GPU). Recent work in this area focuses on building effective acceleration structures, which work well under the constraints of current GPUs. Most approaches are targeted at static scenes and only allow navigation in the virtual scene. So far support for dynamic scenes has not been considered for GPU implementations. We have developed a GPU-based ray tracing system for dynamic scenes consisting of a set of individual objects. Each object may independently move around, but its geometry and topology are static.

OCTAVIS: Optimization Techniques for Multi-GPU Multi-View Rendering

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a high performance-yet low cost-system for multi-view rendering in virtual reality (VR) applications. In contrast to complex CAVE installations, which are typically driven by one render client per view, we arrange eight displays in an octagon around the viewer to provide a full 360° projection, and we drive these eight displays by a single PC equipped with multiple graphics units (GPUs). In this paper we describe the hardware and software setup, as well as the necessary low-level and high-level optimizations to optimally exploit the parallelism of this multi-GPU multi-view VR system.

Collision Detection: Broad Phase Adaptation from Multi-Core to Multi-GPU Architecture

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present in this paper several contributions on the collision detection optimization centered on hardware performance. We focus on the broad phase which is the first step of the collision detection process and propose three new ways of parallelization of the well-known Sweep and Prune algorithm. We first developed a multi-core model takes into account the number of available cores. Multi-core architecture enables us to distribute geometric computations with use of multi-threading. Critical writing section and threads idling have been minimized by introducing new data structures for each thread. Programming with directives, like OpenMP, appears to be a good compromise for code portability. We then proposed a new GPU-based algorithm also based on the "Sweep and Prune" that has been adapted to multi-GPU architectures. Our technique is based on a spatial subdivision method used to distribute computations among GPUs. Results show that significant speed-up can be obtained by passing from 1 to 4 GPUs in a large-scale environment.

Particle methods parallel implementations by GP-GPU strategies

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper outlines the problems found in the parallelization of SPH (Smoothed Particle Hydrodynamics) algorithms using Graphics Processing Units. Different results of some parallel GPU implementations in terms of the speed-up and the scalability compared to the CPU sequential codes are shown. The most problematic stage in the GPU-SPH algorithms is the one responsible for locating neighboring particles and building the vectors where this information is stored, since these specific algorithms raise many dificulties for a data-level parallelization. Because of the fact that the neighbor location using linked lists does not show enough data-level parallelism, two new approaches have been pro- posed to minimize bank conflicts in the writing and subsequent reading of the neighbor lists. The first strategy proposes an efficient coordination between CPU-GPU, using GPU algorithms for those stages that allow a straight forward parallelization, and sequential CPU algorithms for those instructions that involve some kind of vector reduction. This coordination provides a relatively orderly reading of the neighbor lists in the interactions stage, achieving a speed-up factor of x47 in this stage. However, since the construction of the neighbor lists is quite expensive, it is achieved an overall speed-up of x41. The second strategy seeks to maximize the use of the GPU in the neighbor's location process by executing a specific vector sorting algorithm that allows some data-level parallelism. Al- though this strategy has succeeded in improving the speed-up on the stage of neighboring location, the global speed-up on the interactions stage falls, due to inefficient reading of the neighbor vectors. Some changes to these strategies are proposed, aimed at maximizing the computational load of the GPU and using the GPU texture-units, in order to reach the maximum speed-up for such codes. Different practical applications have been added to the mentioned GPU codes. First, the classical dam-break problem is studied. Second, the wave impact of the sloshing fluid contained in LNG vessel tanks is also simulated as a practical example of particle methods

Region-based moving object detection using spatially conditioned nonparametric models in a GPU

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel GPU-based nonparametric moving object detection strategy for computer vision tools requiring real-time processing is proposed. An alternative and efficient Bayesian classifier to combine nonparametric background and foreground models allows increasing correct detections while avoiding false detections. Additionally, an efficient region of interest analysis significantly reduces the computational cost of the detections.

Optimal Piecewise Linear Function Approximation for GPU-based Applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many computer vision and human-computer interaction applications developed in recent years need evaluating complex and continuous mathematical functions as an essential step toward proper operation. However, rigorous evaluation of this kind of functions often implies a very high computational cost, unacceptable in real-time applications. To alleviate this problem, functions are commonly approximated by simpler piecewise-polynomial representations. Following this idea, we propose a novel, efficient, and practical technique to evaluate complex and continuous functions using a nearly optimal design of two types of piecewise linear approximations in the case of a large budget of evaluation subintervals. To this end, we develop a thorough error analysis that yields asymptotically tight bounds to accurately quantify the approximation performance of both representations. It provides an improvement upon previous error estimates and allows the user to control the trade-off between the approximation error and the number of evaluation subintervals. To guarantee real-time operation, the method is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), where it outperforms previous alternative approaches by exploiting the fixed-function interpolation routines present in their texture units. The proposed technique is a perfect match for any application requiring the evaluation of continuous functions, we have measured in detail its quality and efficiency on several functions, and, in particular, the Gaussian function because it is extensively used in many areas of computer vision and cybernetics, and it is expensive to evaluate.

«
1
2
3
4
5
6
7
8
...
16
17
»