Biblioteca Digital

995 resultados para graphics processing units

Logiciel de génération de nombres aléatoires dans OpenCL

Relevância:

100.00% 100.00%

Publicador:

Resumo:

clRNG et clProbdist sont deux interfaces de programmation (APIs) que nous avons développées pour la génération de nombres aléatoires uniformes et non uniformes sur des dispositifs de calculs parallèles en utilisant l’environnement OpenCL. La première interface permet de créer au niveau d’un ordinateur central (hôte) des objets de type stream considérés comme des générateurs virtuels parallèles qui peuvent être utilisés aussi bien sur l’hôte que sur les dispositifs parallèles (unités de traitement graphique, CPU multinoyaux, etc.) pour la génération de séquences de nombres aléatoires. La seconde interface permet aussi de générer au niveau de ces unités des variables aléatoires selon différentes lois de probabilité continues et discrètes. Dans ce mémoire, nous allons rappeler des notions de base sur les générateurs de nombres aléatoires, décrire les systèmes hétérogènes ainsi que les techniques de génération parallèle de nombres aléatoires. Nous présenterons aussi les différents modèles composant l’architecture de l’environnement OpenCL et détaillerons les structures des APIs développées. Nous distinguons pour clRNG les fonctions qui permettent la création des streams, les fonctions qui génèrent les variables aléatoires uniformes ainsi que celles qui manipulent les états des streams. clProbDist contient les fonctions de génération de variables aléatoires non uniformes selon la technique d’inversion ainsi que les fonctions qui permettent de retourner différentes statistiques des lois de distribution implémentées. Nous évaluerons ces interfaces de programmation avec deux simulations qui implémentent un exemple simplifié d’un modèle d’inventaire et un exemple d’une option financière. Enfin, nous fournirons les résultats d’expérimentation sur les performances des générateurs implémentés.

NeMo: a platform for neural modelling of spiking neurons using GPUs

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Simulating spiking neural networks is of great interest to scientists wanting to model the functioning of the brain. However, large-scale models are expensive to simulate due to the number and interconnectedness of neurons in the brain. Furthermore, where such simulations are used in an embodied setting, the simulation must be real-time in order to be useful. In this paper we present NeMo, a platform for such simulations which achieves high performance through the use of highly parallel commodity hardware in the form of graphics processing units (GPUs). NeMo makes use of the Izhikevich neuron model which provides a range of realistic spiking dynamics while being computationally efficient. Our GPU kernel can deliver up to 400 million spikes per second. This corresponds to a real-time simulation of around 40 000 neurons under biologically plausible conditions with 1000 synapses per neuron and a mean firing rate of 10 Hz.

Using GPUs for beamforming acceleration on SAFT imaging

Relevância:

100.00% 100.00%

Publicador:

Resumo:

SAFT techniques are based on the sequential activation, in emission and reception, of the array elements and the post-processing of all the received signals to compose the image. Thus, the image generation can be divided into two stages: (1) the excitation and acquisition stage, where the signals received by each element or group of elements are stored; and (2) the beamforming stage, where the signals are combined together to obtain the image pixels. The use of Graphics Processing Units (GPUs), which are programmable devices with a high level of parallelism, can accelerate the computations of the beamforming process, that usually includes different functions such as dynamic focusing, band-pass filtering, spatial filtering or envelope detection. This work shows that using GPU technology can accelerate, in more than one order of magnitude with respect to CPU implementations, the beamforming and post-processing algorithms in SAFT imaging. ©2009 IEEE.

A path- and label-cost propagation approach to speedup the training of the optimum-path forest classifier

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In general, pattern recognition techniques require a high computational burden for learning the discriminating functions that are responsible to separate samples from distinct classes. As such, there are several studies that make effort to employ machine learning algorithms in the context of big data classification problems. The research on this area ranges from Graphics Processing Units-based implementations to mathematical optimizations, being the main drawback of the former approaches to be dependent on the graphic video card. Here, we propose an architecture-independent optimization approach for the optimum-path forest (OPF) classifier, that is designed using a theoretical formulation that relates the minimum spanning tree with the minimum spanning forest generated by the OPF over the training dataset. The experiments have shown that the approach proposed can be faster than the traditional one in five public datasets, being also as accurate as the original OPF. (C) 2014 Elsevier B. V. All rights reserved.

Image Re-Ranking Acceleration on GPUs

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Huge image collections are becoming available lately. In this scenario, the use of Content-Based Image Retrieval (CBIR) systems has emerged as a promising approach to support image searches. The objective of CBIR systems is to retrieve the most similar images in a collection, given a query image, by taking into account image visual properties such as texture, color, and shape. In these systems, the effectiveness of the retrieval process depends heavily on the accuracy of ranking approaches. Recently, re-ranking approaches have been proposed to improve the effectiveness of CBIR systems by taking into account the relationships among images. The re-ranking approaches consider the relationships among all images in a given dataset. These approaches typically demands a huge amount of computational power, which hampers its use in practical situations. On the other hand, these methods can be massively parallelized. In this paper, we propose to speedup the computation of the RL-Sim algorithm, a recently proposed image re-ranking approach, by using the computational power of Graphics Processing Units (GPU). GPUs are emerging as relatively inexpensive parallel processors that are becoming available on a wide range of computer systems. We address the image re-ranking performance challenges by proposing a parallel solution designed to fit the computational model of GPUs. We conducted an experimental evaluation considering different implementations and devices. Experimental results demonstrate that significant performance gains can be obtained. Our approach achieves speedups of 7x from serial implementation considering the overall algorithm and up to 36x on its core steps.

Predição de sítios de ligação para a cisplatina e transplatina baseada em ligações de hidrogênio

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Biofísica Molecular - IBILCE

Heterogeneous Multi-core Architectures for High Performance Computing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.

Numerical and Analytical Methods for Laser-Plasma Acceleration Physics

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Theories and numerical modeling are fundamental tools for understanding, optimizing and designing present and future laser-plasma accelerators (LPAs). Laser evolution and plasma wave excitation in a LPA driven by a weakly relativistically intense, short-pulse laser propagating in a preformed parabolic plasma channel, is studied analytically in 3D including the effects of pulse steepening and energy depletion. At higher laser intensities, the process of electron self-injection in the nonlinear bubble wake regime is studied by means of fully self-consistent Particle-in-Cell simulations. Considering a non-evolving laser driver propagating with a prescribed velocity, the geometrical properties of the non-evolving bubble wake are studied. For a range of parameters of interest for laser plasma acceleration, The dependence of the threshold for self-injection in the non-evolving wake on laser intensity and wake velocity is characterized. Due to the nonlinear and complex nature of the Physics involved, computationally challenging numerical simulations are required to model laser-plasma accelerators operating at relativistic laser intensities. The numerical and computational optimizations, that combined in the codes INF&RNO and INF&RNO/quasi-static give the possibility to accurately model multi-GeV laser wakefield acceleration stages with present supercomputing architectures, are discussed. The PIC code jasmine, capable of efficiently running laser-plasma simulations on Graphics Processing Units (GPUs) clusters, is presented. GPUs deliver exceptional performance to PIC codes, but the core algorithms had to be redesigned for satisfying the constraints imposed by the intrinsic parallelism of the architecture. The simulation campaigns, run with the code jasmine for modeling the recent LPA experiments with the INFN-FLAME and CNR-ILIL laser systems, are also presented.

Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm.

GPU-based Ray Tracing of Dynamic Scenes

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interactive ray tracing of non-trivial scenes is just becoming feasible on single graphics processing units (GPU). Recent work in this area focuses on building effective acceleration structures, which work well under the constraints of current GPUs. Most approaches are targeted at static scenes and only allow navigation in the virtual scene. So far support for dynamic scenes has not been considered for GPU implementations. We have developed a GPU-based ray tracing system for dynamic scenes consisting of a set of individual objects. Each object may independently move around, but its geometry and topology are static.

Particle methods parallel implementations by GP-GPU strategies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper outlines the problems found in the parallelization of SPH (Smoothed Particle Hydrodynamics) algorithms using Graphics Processing Units. Different results of some parallel GPU implementations in terms of the speed-up and the scalability compared to the CPU sequential codes are shown. The most problematic stage in the GPU-SPH algorithms is the one responsible for locating neighboring particles and building the vectors where this information is stored, since these specific algorithms raise many dificulties for a data-level parallelization. Because of the fact that the neighbor location using linked lists does not show enough data-level parallelism, two new approaches have been pro- posed to minimize bank conflicts in the writing and subsequent reading of the neighbor lists. The first strategy proposes an efficient coordination between CPU-GPU, using GPU algorithms for those stages that allow a straight forward parallelization, and sequential CPU algorithms for those instructions that involve some kind of vector reduction. This coordination provides a relatively orderly reading of the neighbor lists in the interactions stage, achieving a speed-up factor of x47 in this stage. However, since the construction of the neighbor lists is quite expensive, it is achieved an overall speed-up of x41. The second strategy seeks to maximize the use of the GPU in the neighbor's location process by executing a specific vector sorting algorithm that allows some data-level parallelism. Al- though this strategy has succeeded in improving the speed-up on the stage of neighboring location, the global speed-up on the interactions stage falls, due to inefficient reading of the neighbor vectors. Some changes to these strategies are proposed, aimed at maximizing the computational load of the GPU and using the GPU texture-units, in order to reach the maximum speed-up for such codes. Different practical applications have been added to the mentioned GPU codes. First, the classical dam-break problem is studied. Second, the wave impact of the sloshing fluid contained in LNG vessel tanks is also simulated as a practical example of particle methods

Optimal polygonal L1 linearization and fast interpolation of nonlinear systems

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The analysis of complex nonlinear systems is often carried out using simpler piecewise linear representations of them. A principled and practical technique is proposed to linearize and evaluate arbitrary continuous nonlinear functions using polygonal (continuous piecewise linear) models under the L1 norm. A thorough error analysis is developed to guide an optimal design of two kinds of polygonal approximations in the asymptotic case of a large budget of evaluation subintervals N. The method allows the user to obtain the level of linearization (N) for a target approximation error and vice versa. It is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), allowing real-time performance of computationally demanding applications. The quality and efficiency of the technique has been measured in detail on two nonlinear functions that are widely used in many areas of scientific computing and are expensive to evaluate.

Large scale multiphysics modeling of neurites

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Situado en el límite entre Ingeniería, Informática y Biología, la mecánica computacional de las neuronas aparece como un nuevo campo interdisciplinar que potencialmente puede ser capaz de abordar problemas clínicos desde una perspectiva diferente. Este campo es multiescala por naturaleza, yendo desde la nanoescala (como, por ejemplo, los dímeros de tubulina) a la macroescala (como, por ejemplo, el tejido cerebral), y tiene como objetivo abordar problemas que son complejos, y algunas veces imposibles, de estudiar con medios experimentales. La modelización computacional ha sido ampliamente empleada en aplicaciones Neurocientíficas tan diversas como el crecimiento neuronal o la propagación de los potenciales de acción compuestos. Sin embargo, en la mayoría de los enfoques de modelización hechos hasta ahora, la interacción entre la célula y el medio/estímulo que la rodea ha sido muy poco explorada. A pesar de la tremenda importancia de esa relación en algunos desafíos médicos—como, por ejemplo, lesiones traumáticas en el cerebro, cáncer, la enfermedad del Alzheimer—un puente que relacione las propiedades electrofisiológicas-químicas y mecánicas desde la escala molecular al nivel celular todavía no existe. Con ese objetivo, esta investigación propone un marco computacional multiescala particularizado para dos escenarios respresentativos: el crecimiento del axón y el acomplamiento electrofisiológicomecánico de las neuritas. En el primer caso, se explora la relación entre los constituyentes moleculares del axón durante su crecimiento y sus propiedades mecánicas resultantes, mientras que en el último, un estímulo mecánico provoca deficiencias funcionales a nivel celular como consecuencia de sus alteraciones electrofisiológicas-químicas. La modelización computacional empleada en este trabajo es el método de las diferencias finitas, y es implementada en un nuevo programa llamado Neurite. Aunque el método de los elementos finitos es también explorado en parte de esta investigación, el método de las diferencias finitas tiene la flexibilidad y versatilidad necesaria para implementar mode los biológicos, así como la simplicidad matemática para extenderlos a simulaciones a gran escala con un coste computacional bajo. Centrándose primero en el efecto de las propiedades electrofisiológicas-químicas sobre las propiedades mecánicas, una versión adaptada de Neurite es desarrollada para simular la polimerización de los microtúbulos en el crecimiento del axón y proporcionar las propiedades mecánicas como función de la ocupación de los microtúbulos. Después de calibrar el modelo de crecimiento del axón frente a resultados experimentales disponibles en la literatura, las características mecánicas pueden ser evaluadas durante la simulación. Las propiedades mecánicas del axón muestran variaciones dramáticas en la punta de éste, donde el cono de crecimiento soporta las señales químicas y mecánicas. Bansándose en el conocimiento ganado con el modelo de diferencias finitas, y con el objetivo de ir de 1D a 3D, este esquema preliminar pero de una naturaleza innovadora allana el camino a futuros estudios con el método de los elementos finitos. Centrándose finalmente en el efecto de las propiedades mecánicas sobre las propiedades electrofisiológicas- químicas, Neurite es empleado para relacionar las cargas mecánicas macroscópicas con las deformaciones y velocidades de deformación a escala microscópica, y simular la propagación de la señal eléctrica en las neuritas bajo carga mecánica. Las simulaciones fueron calibradas con resultados experimentales publicados en la literatura, proporcionando, por tanto, un modelo capaz de predecir las alteraciones de las funciones electrofisiológicas neuronales bajo cargas externas dañinas, y uniendo lesiones mecánicas con las correspondientes deficiencias funcionales. Para abordar simulaciones a gran escala, aunque otras arquitecturas avanzadas basadas en muchos núcleos integrados (MICs) fueron consideradas, los solvers explícito e implícito se implementaron en unidades de procesamiento central (CPU) y unidades de procesamiento gráfico (GPUs). Estudios de escalabilidad fueron llevados acabo para ambas implementaciones mostrando resultados prometedores para casos de simulaciones extremadamente grandes con GPUs. Esta tesis abre la vía para futuros modelos mecánicos con el objetivo de unir las propiedades electrofisiológicas-químicas con las propiedades mecánicas. El objetivo general es mejorar el conocimiento de las comunidades médicas y de bioingeniería sobre la mecánica de las neuronas y las deficiencias funcionales que aparecen de los daños producidos por traumatismos mecánicos, como lesiones traumáticas en el cerebro, o enfermedades neurodegenerativas como la enfermedad del Alzheimer. ABSTRACT Sitting at the interface between Engineering, Computer Science and Biology, Computational Neuron Mechanics appears as a new interdisciplinary field potentially able to tackle clinical problems from a new perspective. This field is multiscale by nature, ranging from the nanoscale (e.g., tubulin dimers) to the macroscale (e.g., brain tissue), and aims at tackling problems that are complex, and sometime impossible, to study through experimental means. Computational modeling has been widely used in different Neuroscience applications as diverse as neuronal growth or compound action potential propagation. However, in the majority of the modeling approaches done in this field to date, the interactions between the cell and its surrounding media/stimulus have been rarely explored. Despite of the tremendous importance of such relationship in several medical challenges—e.g., traumatic brain injury (TBI), cancer, Alzheimer’s disease (AD)—a bridge between electrophysiological-chemical and mechanical properties of neurons from the molecular scale to the cell level is still lacking. To this end, this research proposes a multiscale computational framework particularized for two representative scenarios: axon growth and electrophysiological-mechanical coupling of neurites. In the former case, the relation between the molecular constituents of the axon during its growth and its resulting mechanical properties is explored, whereas in the latter, a mechanical stimulus provokes functional deficits at cell level as a consequence of its electrophysiological-chemical alterations. The computational modeling approach chosen in this work is the finite difference method (FDM), and was implemented in a new program called Neurite. Although the finite element method (FEM) is also explored as part of this research, the FDM provides the necessary flexibility and versatility to implement biological models, as well as the mathematical simplicity to extend them to large scale simulations with a low computational cost. Focusing first on the effect of electrophysiological-chemical properties on the mechanical proper ties, an adaptation of Neurite was developed to simulate microtubule polymerization in axonal growth and provide the axon mechanical properties as a function of microtubule occupancy. After calibrating the axon growth model against experimental results available in the literature, the mechanical characteristics can be tracked during the simulation. The axon mechanical properties show dramatic variations at the tip of the axon, where the growth cone supports the chemical and mechanical signaling. Based on the knowledge gained from the FDM scheme, and in order to go from 1D to 3D, this preliminary yet novel scheme paves the road for future studies with FEM. Focusing then on the effect of mechanical properties on the electrophysiological-chemical properties, Neurite was used to relate macroscopic mechanical loading to microscopic strains and strain rates, and simulate the electrical signal propagation along neurites under mechanical loading. The simulations were calibrated against experimental results published in the literature, thus providing a model able to predict the alteration of neuronal electrophysiological function under external damaging load, and linking mechanical injuries to subsequent acute functional deficits. To undertake large scale simulations, although other state-of-the-art architectures based on many integrated cores (MICs) were considered, the explicit and implicit solvers were implemented for central processing units (CPUs) and graphics processing units (GPUs). Scalability studies were done for both implementations showing promising results for extremely large scale simulations with GPUs. This thesis opens the avenue for future mechanical modeling approaches aimed at linking electrophysiological- chemical properties to mechanical properties. Its overarching goal is to enhance the bioengineering and medical communities knowledge on neuronal mechanics and functional deficits arising from damages produced by direct mechanical insults, such as TBI, or neurodegenerative evolving illness, such as AD.

Neurite, a finite difference large scale parallel program for the simulation of the electrical signal propagation in neurites under mechanical loading

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, function of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells has only very recently been proposed (Jerusalem et al., 2013). In this paper, we present the implementation details of Neurite: the finite difference parallel program used in this reference. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite-explicit and implicit-were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between lectrophysiology and mechanics (Jerusalem et al., 2013). This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented dendritic tree, and a damaged axon. The capabilities of the program to deal with large scale scenarios, segmented neuronal structures, and functional deficits under mechanical loading are specifically highlighted.

Optimal Piecewise Linear Function Approximation for GPU-based Applications

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many computer vision and human-computer interaction applications developed in recent years need evaluating complex and continuous mathematical functions as an essential step toward proper operation. However, rigorous evaluation of this kind of functions often implies a very high computational cost, unacceptable in real-time applications. To alleviate this problem, functions are commonly approximated by simpler piecewise-polynomial representations. Following this idea, we propose a novel, efficient, and practical technique to evaluate complex and continuous functions using a nearly optimal design of two types of piecewise linear approximations in the case of a large budget of evaluation subintervals. To this end, we develop a thorough error analysis that yields asymptotically tight bounds to accurately quantify the approximation performance of both representations. It provides an improvement upon previous error estimates and allows the user to control the trade-off between the approximation error and the number of evaluation subintervals. To guarantee real-time operation, the method is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), where it outperforms previous alternative approaches by exploiting the fixed-function interpolation routines present in their texture units. The proposed technique is a perfect match for any application requiring the evaluation of continuous functions, we have measured in detail its quality and efficiency on several functions, and, in particular, the Gaussian function because it is extensively used in many areas of computer vision and cybernetics, and it is expensive to evaluate.

«
1
2
3
4
5
6
7
8
...
66
67
»