802 resultados para GPU computing
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Large-scale simulations of parts of the brain using detailed neuronal models to improve our understanding of brain functions are becoming a reality with the usage of supercomputers and large clusters. However, the high acquisition and maintenance cost of these computers, including the physical space, air conditioning, and electrical power, limits the number of simulations of this kind that scientists can perform. Modern commodity graphical cards, based on the CUDA platform, contain graphical processing units (GPUs) composed of hundreds of processors that can simultaneously execute thousands of threads and thus constitute a low-cost solution for many high-performance computing applications. In this work, we present a CUDA algorithm that enables the execution, on multiple GPUs, of simulations of large-scale networks composed of biologically realistic Hodgkin-Huxley neurons. The algorithm represents each neuron as a CUDA thread, which solves the set of coupled differential equations that model each neuron. Communication among neurons located in different GPUs is coordinated by the CPU. We obtained speedups of 40 for the simulation of 200k neurons that received random external input and speedups of 9 for a network with 200k neurons and 20M neuronal connections, in a single computer with two graphic boards with two GPUs each, when compared with a modern quad-core CPU. Copyright (C) 2010 John Wiley & Sons, Ltd.
Resumo:
A parallel algorithm to remove impulsive noise in digital images using heterogeneous CPU/GPU computing is proposed. The parallel denoising algorithm is based on the peer group concept and uses an Euclidean metric. In order to identify the amount of pixels to be allocated in multi-core and GPUs, a performance analysis using large images is presented. A comparison of the parallel implementation in multi-core, GPUs and a combination of both is performed. Performance has been evaluated in terms of execution time and Megapixels/second. We present several optimization strategies especially effective for the multi-core environment, and demonstrate significant performance improvements. The main advantage of the proposed noise removal methodology is its computational speed, which enables efficient filtering of color images in real-time applications.
Resumo:
In this project, we propose the implementation of a 3D object recognition system which will be optimized to operate under demanding time constraints. The system must be robust so that objects can be recognized properly in poor light conditions and cluttered scenes with significant levels of occlusion. An important requirement must be met: the system must exhibit a reasonable performance running on a low power consumption mobile GPU computing platform (NVIDIA Jetson TK1) so that it can be integrated in mobile robotics systems, ambient intelligence or ambient assisted living applications. The acquisition system is based on the use of color and depth (RGB-D) data streams provided by low-cost 3D sensors like Microsoft Kinect or PrimeSense Carmine. The range of algorithms and applications to be implemented and integrated will be quite broad, ranging from the acquisition, outlier removal or filtering of the input data and the segmentation or characterization of regions of interest in the scene to the very object recognition and pose estimation. Furthermore, in order to validate the proposed system, we will create a 3D object dataset. It will be composed by a set of 3D models, reconstructed from common household objects, as well as a handful of test scenes in which those objects appear. The scenes will be characterized by different levels of occlusion, diverse distances from the elements to the sensor and variations on the pose of the target objects. The creation of this dataset implies the additional development of 3D data acquisition and 3D object reconstruction applications. The resulting system has many possible applications, ranging from mobile robot navigation and semantic scene labeling to human-computer interaction (HCI) systems based on visual information.
Resumo:
A graphical processing unit (GPU) is a hardware device normally used to manipulate computer memory for the display of images. GPU computing is the practice of using a GPU device for scientific or general purpose computations that are not necessarily related to the display of images. Many problems in econometrics have a structure that allows for successful use of GPU computing. We explore two examples. The first is simple: repeated evaluation of a likelihood function at different parameter values. The second is a more complicated estimator that involves simulation and nonparametric fitting. We find speedups from 1.5 up to 55.4 times, compared to computations done on a single CPU core. These speedups can be obtained with very little expense, energy consumption, and time dedicated to system maintenance, compared to equivalent performance solutions using CPUs. Code for the examples is provided.
Resumo:
Die Entstehung eines Marktpreises für einen Vermögenswert kann als Superposition der einzelnen Aktionen der Marktteilnehmer aufgefasst werden, die damit kumulativ Angebot und Nachfrage erzeugen. Dies ist in der statistischen Physik mit der Entstehung makroskopischer Eigenschaften vergleichbar, die von mikroskopischen Wechselwirkungen zwischen den beteiligten Systemkomponenten hervorgerufen werden. Die Verteilung der Preisänderungen an Finanzmärkten unterscheidet sich deutlich von einer Gaußverteilung. Dies führt zu empirischen Besonderheiten des Preisprozesses, zu denen neben dem Skalierungsverhalten nicht-triviale Korrelationsfunktionen und zeitlich gehäufte Volatilität zählen. In der vorliegenden Arbeit liegt der Fokus auf der Analyse von Finanzmarktzeitreihen und den darin enthaltenen Korrelationen. Es wird ein neues Verfahren zur Quantifizierung von Muster-basierten komplexen Korrelationen einer Zeitreihe entwickelt. Mit dieser Methodik werden signifikante Anzeichen dafür gefunden, dass sich typische Verhaltensmuster von Finanzmarktteilnehmern auf kurzen Zeitskalen manifestieren, dass also die Reaktion auf einen gegebenen Preisverlauf nicht rein zufällig ist, sondern vielmehr ähnliche Preisverläufe auch ähnliche Reaktionen hervorrufen. Ausgehend von der Untersuchung der komplexen Korrelationen in Finanzmarktzeitreihen wird die Frage behandelt, welche Eigenschaften sich beim Wechsel von einem positiven Trend zu einem negativen Trend verändern. Eine empirische Quantifizierung mittels Reskalierung liefert das Resultat, dass unabhängig von der betrachteten Zeitskala neue Preisextrema mit einem Anstieg des Transaktionsvolumens und einer Reduktion der Zeitintervalle zwischen Transaktionen einhergehen. Diese Abhängigkeiten weisen Charakteristika auf, die man auch in anderen komplexen Systemen in der Natur und speziell in physikalischen Systemen vorfindet. Über 9 Größenordnungen in der Zeit sind diese Eigenschaften auch unabhängig vom analysierten Markt - Trends, die nur für Sekunden bestehen, zeigen die gleiche Charakteristik wie Trends auf Zeitskalen von Monaten. Dies eröffnet die Möglichkeit, mehr über Finanzmarktblasen und deren Zusammenbrüche zu lernen, da Trends auf kleinen Zeitskalen viel häufiger auftreten. Zusätzlich wird eine Monte Carlo-basierte Simulation des Finanzmarktes analysiert und erweitert, um die empirischen Eigenschaften zu reproduzieren und Einblicke in deren Ursachen zu erhalten, die zum einen in der Finanzmarktmikrostruktur und andererseits in der Risikoaversion der Handelsteilnehmer zu suchen sind. Für die rechenzeitintensiven Verfahren kann mittels Parallelisierung auf einer Graphikkartenarchitektur eine deutliche Rechenzeitreduktion erreicht werden. Um das weite Spektrum an Einsatzbereichen von Graphikkarten zu aufzuzeigen, wird auch ein Standardmodell der statistischen Physik - das Ising-Modell - auf die Graphikkarte mit signifikanten Laufzeitvorteilen portiert. Teilresultate der Arbeit sind publiziert in [PGPS07, PPS08, Pre11, PVPS09b, PVPS09a, PS09, PS10a, SBF+10, BVP10, Pre10, PS10b, PSS10, SBF+11, PB10].
Resumo:
This thesis investigates interactive scene reconstruction and understanding using RGB-D data only. Indeed, we believe that depth cameras will still be in the near future a cheap and low-power 3D sensing alternative suitable for mobile devices too. Therefore, our contributions build on top of state-of-the-art approaches to achieve advances in three main challenging scenarios, namely mobile mapping, large scale surface reconstruction and semantic modeling. First, we will describe an effective approach dealing with Simultaneous Localization And Mapping (SLAM) on platforms with limited resources, such as a tablet device. Unlike previous methods, dense reconstruction is achieved by reprojection of RGB-D frames, while local consistency is maintained by deploying relative bundle adjustment principles. We will show quantitative results comparing our technique to the state-of-the-art as well as detailed reconstruction of various environments ranging from rooms to small apartments. Then, we will address large scale surface modeling from depth maps exploiting parallel GPU computing. We will develop a real-time camera tracking method based on the popular KinectFusion system and an online surface alignment technique capable of counteracting drift errors and closing small loops. We will show very high quality meshes outperforming existing methods on publicly available datasets as well as on data recorded with our RGB-D camera even in complete darkness. Finally, we will move to our Semantic Bundle Adjustment framework to effectively combine object detection and SLAM in a unified system. Though the mathematical framework we will describe does not restrict to a particular sensing technology, in the experimental section we will refer, again, only to RGB-D sensing. We will discuss successful implementations of our algorithm showing the benefit of a joint object detection, camera tracking and environment mapping.
Resumo:
Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems. © 2014 Technical University of Munich (TUM).
Resumo:
Graphics processors were originally developed for rendering graphics but have recently evolved towards being an architecture for general-purpose computations. They are also expected to become important parts of embedded systems hardware -- not just for graphics. However, this necessitates the development of appropriate timing analysis techniques which would be required because techniques developed for CPU scheduling are not applicable. The reason is that we are not interested in how long it takes for any given GPU thread to complete, but rather how long it takes for all of them to complete. We therefore develop a simple method for finding an upper bound on the makespan of a group of GPU threads executing the same program and competing for the resources of a single streaming multiprocessor (whose architecture is based on NVIDIA Fermi, with some simplifying assunptions). We then build upon this method to formulate the derivation of the exact worst-case makespan (and corresponding schedule) as an optimization problem. Addressing the issue of tractability, we also present a technique for efficiently computing a safe estimate of the worstcase makespan with minimal pessimism, which may be used when finding an exact value would take too long.
Resumo:
Hyperspectral imaging can be used for object detection and for discriminating between different objects based on their spectral characteristics. One of the main problems of hyperspectral data analysis is the presence of mixed pixels, due to the low spatial resolution of such images. This means that several spectrally pure signatures (endmembers) are combined into the same mixed pixel. Linear spectral unmixing follows an unsupervised approach which aims at inferring pure spectral signatures and their material fractions at each pixel of the scene. The huge data volumes acquired by such sensors put stringent requirements on processing and unmixing methods. This paper proposes an efficient implementation of a unsupervised linear unmixing method on GPUs using CUDA. The method finds the smallest simplex by solving a sequence of nonsmooth convex subproblems using variable splitting to obtain a constraint formulation, and then applying an augmented Lagrangian technique. The parallel implementation of SISAL presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory. The results herein presented indicate that the GPU implementation can significantly accelerate the method's execution over big datasets while maintaining the methods accuracy.
Resumo:
Remote hyperspectral sensors collect large amounts of data per flight usually with low spatial resolution. It is known that the bandwidth connection between the satellite/airborne platform and the ground station is reduced, thus a compression onboard method is desirable to reduce the amount of data to be transmitted. This paper presents a parallel implementation of an compressive sensing method, called parallel hyperspectral coded aperture (P-HYCA), for graphics processing units (GPU) using the compute unified device architecture (CUDA). This method takes into account two main properties of hyperspectral dataset, namely the high correlation existing among the spectral bands and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. Experimental results conducted using synthetic and real hyperspectral datasets on two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN, reveal that the use of GPUs can provide real-time compressive sensing performance. The achieved speedup is up to 20 times when compared with the processing time of HYCA running on one core of the Intel i7-2600 CPU (3.4GHz), with 16 Gbyte memory.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
The Graphics Processing Unit (GPU) is present in almost every modern day personal computer. Despite its specific purpose design, they have been increasingly used for general computations with very good results. Hence, there is a growing effort from the community to seamlessly integrate this kind of devices in everyday computing. However, to fully exploit the potential of a system comprising GPUs and CPUs, these devices should be presented to the programmer as a single platform. The efficient combination of the power of CPU and GPU devices is highly dependent on each device’s characteristics, resulting in platform specific applications that cannot be ported to different systems. Also, the most efficient work balance among devices is highly dependable on the computations to be performed and respective data sizes. In this work, we propose a solution for heterogeneous environments based on the abstraction level provided by algorithmic skeletons. Our goal is to take full advantage of the power of all CPU and GPU devices present in a system, without the need for different kernel implementations nor explicit work-distribution.To that end, we extended Marrow, an algorithmic skeleton framework for multi-GPUs, to support CPU computations and efficiently balance the work-load between devices. Our approach is based on an offline training execution that identifies the ideal work balance and platform configurations for a given application and input data size. The evaluation of this work shows that the combination of CPU and GPU devices can significantly boost the performance of our benchmarks in the tested environments, when compared to GPU-only executions.