38 resultados para NVIDIA


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of software. The ability to carry out reverse-engineering of software is of great importance within the security and forensics elds, particularly when investigating malicious software or carrying out forensic analysis following a successful security breach. Due to the complexity of the Nvidia CUDA (Compute Uni ed Device Architecture) framework, it is not clear how best to approach the reverse engineering of a piece of CUDA software. We carry out a review of the di erent binary output formats which may be encountered from the CUDA compiler, and their implications on reverse engineering. We then demonstrate the process of carrying out disassembly of an example CUDA application, to establish the various techniques available to forensic investigators carrying out black-box disassembly and reverse engineering of CUDA binaries. We show that the Nvidia compiler, using default settings, leaks useful information. Finally, we demonstrate techniques to better protect intellectual property in CUDA algorithm implementations from reverse engineering.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The high performance computing community has traditionally focused uniquely on the reduction of execution time, though in the last years, the optimization of energy consumption has become a main issue. A reduction of energy usage without a degradation of performance requires the adoption of energy-efficient hardware platforms accompanied by the development of energy-aware algorithms and computational kernels. The solution of linear systems is a key operation for many scientific and engineering problems. Its relevance has motivated an important amount of work, and consequently, it is possible to find high performance solvers for a wide variety of hardware platforms. In this work, we aim to develop a high performance and energy-efficient linear system solver. In particular, we develop two solvers for a low-power CPU-GPU platform, the NVIDIA Jetson TK1. These solvers implement the Gauss-Huard algorithm yielding an efficient usage of the target hardware as well as an efficient memory access. The experimental evaluation shows that the novel proposal reports important savings in both time and energy-consumption when compared with the state-of-the-art solvers of the platform.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Heterogeneous computing systems have become common in modern processor architectures. These systems, such as those released by AMD, Intel, and Nvidia, include both CPU and GPU cores on a single die available with reduced communication overhead compared to their discrete predecessors. Currently, discrete CPU/GPU systems are limited, requiring larger, regular, highly-parallel workloads to overcome the communication costs of the system. Without the traditional communication delay assumed between GPUs and CPUs, we believe non-traditional workloads could be targeted for GPU execution. Specifically, this thesis focuses on the execution model of nested parallel workloads on heterogeneous systems. We have designed a simulation flow which utilizes widely used CPU and GPU simulators to model heterogeneous computing architectures. We then applied this simulator to non-traditional GPU workloads using different execution models. We also have proposed a new execution model for nested parallelism allowing users to exploit these heterogeneous systems to reduce execution time.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Abstract: As time has passed, the general purpose programming paradigm has evolved, producing different hardware architectures whose characteristics differ widely. In this work, we are going to demonstrate, through different applications belonging to the field of Image Processing, the existing difference between three Nvidia hardware platforms: two of them belong to the GeForce graphics cards series, the GTX 480 and the GTX 980 and one of the low consumption platforms which purpose is to allow the execution of embedded applications as well as providing an extreme efficiency: the Jetson TK1. With respect to the test applications we will use five examples from Nvidia CUDA Samples. These applications are directly related to Image Processing, as the algorithms they use are similar to those from the field of medical image registration. After the tests, it will be proven that GTX 980 is both the device with the highest computational power and the one that has greater consumption, it will be seen that Jetson TK1 is the most efficient platform, it will be shown that GTX 480 produces more heat than the others and we will learn other effects produced by the existing difference between the architecture of the devices.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

En el campo de la medicina clínica es crucial poder determinar la seguridad y la eficacia de los fármacos actuales y además acelerar el descubrimiento de nuevos compuestos activos. Para ello se llevan a cabo ensayos de laboratorio, que son métodos muy costosos y que requieren mucho tiempo. Sin embargo, la bioinformática puede facilitar enormemente la investigación clínica para los fines mencionados, ya que proporciona la predicción de la toxicidad de los fármacos y su actividad en enfermedades nuevas, así como la evolución de los compuestos activos descubiertos en ensayos clínicos. Esto se puede lograr gracias a la disponibilidad de herramientas de bioinformática y métodos de cribado virtual por ordenador (CV) que permitan probar todas las hipótesis necesarias antes de realizar los ensayos clínicos, tales como el docking estructural, mediante el programa BINDSURF. Sin embargo, la precisión de la mayoría de los métodos de CV se ve muy restringida a causa de las limitaciones presentes en las funciones de afinidad o scoring que describen las interacciones biomoleculares, e incluso hoy en día estas incertidumbres no se conocen completamente. En este trabajo abordamos este problema, proponiendo un nuevo enfoque en el que las redes neuronales se entrenan con información relativa a bases de datos de compuestos conocidos (proteínas diana y fármacos), y se aprovecha después el método para incrementar la precisión de las predicciones de afinidad del método de CV BINDSURF.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

After a decade evolving in the High Performance Computing arena, GPU-equipped supercomputers have con- quered the top500 and green500 lists, providing us unprecedented levels of computational power and memory bandwidth. This year, major vendors have introduced new accelerators based on 3D memory, like Xeon Phi Knights Landing by Intel and Pascal architecture by Nvidia. This paper reviews hardware features of those new HPC accelerators and unveils potential performance for scientific applications, with an emphasis on Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM) used by commercial products according to roadmaps already announced.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In questa tesi discuteremo come è possibile effettuare la traduzione di un software parallelo scritto in linguaggio CUDA ad uno in linguaggio OpenCL. Tratteremo le tecnologie utilizzate per lo sviluppo di un simulatore cardiaco parallelo e discuteremo in particolar modo come derivare da queste una versione che ne permetta l’esecuzione su schede video e processori arbitrari. Questa versione verrà messa poi a confronto con quelle già esistenti, per analizzarne prestazioni ed eventuali cambiamenti strutturali del codice. Quanto affermato sopra è stato possibile in gran parte grazie ad un wrapper chiamato SimpleCL pensato per rendere la programmazione OpenCL simile a quella in ambiente CUDA. OpenCL permette di operare con le unità di calcolo in maniera molto astratta, ricordando vagamente i concetti di astrazione di memoria e processori della controparte NVIDIA. Ragionevolmente SimpleCL fornisce solamente una interfaccia che ricorda chiamate CUDA, mantenendo il flusso sottostante fedele a quello che si aspetterebbe OpenCL.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

La Stereo Vision è un popolare argomento di ricerca nel campo della Visione Artificiale; esso consiste nell’usare due immagini di una stessa scena,prodotte da due fotocamere diverse, per estrarre informazioni in 3D. L’idea di base della Stereo Vision è la simulazione della visione binoculare umana:le due fotocamere sono disposte in orizzontale per fungere da “occhi” che guardano la scena in 3D. Confrontando le due immagini ottenute, si possono ottenere informazioni riguardo alle posizioni degli oggetti della scena.In questa relazione presenteremo un algoritmo di Stereo Vision: si tratta di un algoritmo parallelo che ha come obiettivo di tracciare le linee di livello di un area geografica. L’algoritmo in origine era stato implementato per la Connection Machine CM-2, un supercomputer sviluppato negli anni 80, ed era espresso in *Lisp, un linguaggio derivato dal Lisp e ideato per la macchina stessa. Questa relazione tratta anche la traduzione e l’implementazione dell’algoritmo in CUDA, ovvero un’architettura hardware per l’elaborazione pa- rallela sviluppata da NVIDIA, che consente di eseguire codice parallelo su GPU. Si darà inoltre uno sguardo alle difficoltà che sono state riscontrate nella traduzione da *Lisp a CUDA.