963 resultados para GPU acceleration
Resumo:
We report on the ion acceleration mechanisms that occur during the interaction of an intense and ultrashort laser pulse ( λ > μ I 2 1018 W cm−2 m2) with an underdense helium plasma produced from an ionized gas jet target. In this unexplored regime, where the laser pulse duration is comparable to the inverse of the electron plasma frequency ωpe, reproducible non-thermal ion bunches have been measured in the radial direction. The two He ion charge states present energy distributions with cutoff energies between 150 and 200 keV, and a striking energy gap around 50 keV appearing consistently for all the shots in a given density range. Fully electromagnetic particle-in-cell simulations explain the experimental behaviors. The acceleration results from a combination of target normal sheath acceleration and Coulomb explosion of a filament formed around the laser pulse propagation axis
Resumo:
A novel GPU-based nonparametric moving object detection strategy for computer vision tools requiring real-time processing is proposed. An alternative and efficient Bayesian classifier to combine nonparametric background and foreground models allows increasing correct detections while avoiding false detections. Additionally, an efficient region of interest analysis significantly reduces the computational cost of the detections.
Resumo:
In this article, an approximate analytical solution for the two body problem perturbed by a radial, low acceleration is obtained, using a regularized formulation of the orbital motion and the method of multiple scales. The results reveal that the physics of the problem evolve in two fundamental scales of the true anomaly. The first one drives the oscillations of the orbital parameters along each orbit. The second one is responsible of the long-term variations in the amplitude and mean values of these oscillations. A good agreement is found with high precision numerical solutions.
Minimum volume stability limits for axisymmetric liquid bridges subject to steady axial acceleration
Resumo:
In this paper the influence of an axial microgravity on the minimum volume stability limit of axisymmetric liquid bridges between unequal disks is analyzed both theoretically and experimentally. The results here presented extend the knowledge of the static behaviour of liquid bridges to fluid configurations different from those studied up to now (almost equal disks). Experimental results, obtained by simulating microgravity conditions by the neutral buoyancy technique, are also presented and are shown to be in complete agreement with theoretical ones.
Resumo:
Many computer vision and human-computer interaction applications developed in recent years need evaluating complex and continuous mathematical functions as an essential step toward proper operation. However, rigorous evaluation of this kind of functions often implies a very high computational cost, unacceptable in real-time applications. To alleviate this problem, functions are commonly approximated by simpler piecewise-polynomial representations. Following this idea, we propose a novel, efficient, and practical technique to evaluate complex and continuous functions using a nearly optimal design of two types of piecewise linear approximations in the case of a large budget of evaluation subintervals. To this end, we develop a thorough error analysis that yields asymptotically tight bounds to accurately quantify the approximation performance of both representations. It provides an improvement upon previous error estimates and allows the user to control the trade-off between the approximation error and the number of evaluation subintervals. To guarantee real-time operation, the method is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), where it outperforms previous alternative approaches by exploiting the fixed-function interpolation routines present in their texture units. The proposed technique is a perfect match for any application requiring the evaluation of continuous functions, we have measured in detail its quality and efficiency on several functions, and, in particular, the Gaussian function because it is extensively used in many areas of computer vision and cybernetics, and it is expensive to evaluate.
Resumo:
The stability limit of minimum volume and the breaking dynamics of liquid bridges between nonequal, noncoaxial, circular supporting disks subject to a lateral acceleration were experimentally analyzed by working with liquid bridges of very small dimensions. Experimental results are compared with asymptotic theoretical predictions, with the agreement between experimental results and asymptotic ones being satisfactory
Resumo:
In this work is addressed the topic of estimation of velocity and acceleration from digital position data. It is presented a review of several classic methods and implemented with real position data from a low cost digital sensor of a hydraulic linear actuator. The results are analyzed and compared. It is shown that static methods have a limited bandwidth application, and that the performance of some methods may be enhanced by adapting its parameters according to the current state.
Resumo:
Debido al creciente aumento del tamaño de los datos en muchos de los actuales sistemas de información, muchos de los algoritmos de recorrido de estas estructuras pierden rendimento para realizar búsquedas en estos. Debido a que la representacion de estos datos en muchos casos se realiza mediante estructuras nodo-vertice (Grafos), en el año 2009 se creó el reto Graph500. Con anterioridad, otros retos como Top500 servían para medir el rendimiento en base a la capacidad de cálculo de los sistemas, mediante tests LINPACK. En caso de Graph500 la medicion se realiza mediante la ejecución de un algoritmo de recorrido en anchura de grafos (BFS en inglés) aplicada a Grafos. El algoritmo BFS es uno de los pilares de otros muchos algoritmos utilizados en grafos como SSSP, shortest path o Betweeness centrality. Una mejora en este ayudaría a la mejora de los otros que lo utilizan. Analisis del Problema El algoritmos BFS utilizado en los sistemas de computación de alto rendimiento (HPC en ingles) es usualmente una version para sistemas distribuidos del algoritmo secuencial original. En esta versión distribuida se inicia la ejecución realizando un particionado del grafo y posteriormente cada uno de los procesadores distribuidos computará una parte y distribuirá sus resultados a los demás sistemas. Debido a que la diferencia de velocidad entre el procesamiento en cada uno de estos nodos y la transfencia de datos por la red de interconexión es muy alta (estando en desventaja la red de interconexion) han sido bastantes las aproximaciones tomadas para reducir la perdida de rendimiento al realizar transferencias. Respecto al particionado inicial del grafo, el enfoque tradicional (llamado 1D-partitioned graph en ingles) consiste en asignar a cada nodo unos vertices fijos que él procesará. Para disminuir el tráfico de datos se propuso otro particionado (2D) en el cual la distribución se haciá en base a las aristas del grafo, en vez de a los vertices. Este particionado reducía el trafico en la red en una proporcion O(NxM) a O(log(N)). Si bien han habido otros enfoques para reducir la transferecnia como: reordemaniento inicial de los vertices para añadir localidad en los nodos, o particionados dinámicos, el enfoque que se va a proponer en este trabajo va a consistir en aplicar técnicas recientes de compression de grandes sistemas de datos como Bases de datos de alto volume o motores de búsqueda en internet para comprimir los datos de las transferencias entre nodos.---ABSTRACT---The Breadth First Search (BFS) algorithm is the foundation and building block of many higher graph-based operations such as spanning trees, shortest paths and betweenness centrality. The importance of this algorithm increases each day due to it is a key requirement for many data structures which are becoming popular nowadays. These data structures turn out to be internally graph structures. When the BFS algorithm is parallelized and the data is distributed into several processors, some research shows a performance limitation introduced by the interconnection network [31]. Hence, improvements on the area of communications may benefit the global performance in this key algorithm. In this work it is presented an alternative compression mechanism. It differs with current existing methods in that it is aware of characteristics of the data which may benefit the compression. Apart from this, we will perform a other test to see how this algorithm (in a dis- tributed scenario) benefits from traditional instruction-based optimizations. Last, we will review the current supercomputing techniques and the related work being done in the area.
Resumo:
The cyclooxygenase (COX) product, prostacyclin (PGI2), inhibits platelet activation and vascular smooth-muscle cell migration and proliferation. Biochemically selective inhibition of COX-2 reduces PGI2 biosynthesis substantially in humans. Because deletion of the PGI2 receptor accelerates atherogenesis in the fat-fed low density lipoprotein receptor knockout mouse, we wished to determine whether selective inhibition of COX-2 would accelerate atherogenesis in this model. To address this hypothesis, we used dosing with nimesulide, which inhibited COX-2 ex vivo, depressed urinary 2,3 dinor 6-keto PGF1α by approximately 60% but had no effect on thromboxane formation by platelets, which only express COX-1. By contrast, the isoform nonspecific inhibitor, indomethacin, suppressed platelet function and thromboxane formation ex vivo and in vivo, coincident with effects on PGI2 biosynthesis indistinguishable from nimesulide. Indomethacin reduced the extent of atherosclerosis by 55 ± 4%, whereas nimesulide failed to increase the rate of atherogenesis. Despite their divergent effects on atherogenesis, both drugs depressed two indices of systemic inflammation, soluble intracellular adhesion molecule-1, and monocyte chemoattractant protein-1 to a similar but incomplete degree. Neither drug altered serum lipids and the marked increase in vascular expression of COX-2 during atherogenesis. Accelerated progression of atherosclerosis is unlikely during chronic intake of specific COX-2 inhibitors. Furthermore, evidence that COX-1-derived prostanoids contribute to atherogenesis suggests that controlled evaluation of the effects of nonsteroidal anti-inflammatory drugs and/or aspirin on plaque progression in humans is timely.
Resumo:
To investigate the relation between cell division and expansion in the regulation of organ growth rate, we used Arabidopsis thaliana primary roots grown vertically at 20°C with an elongation rate that increased steadily during the first 14 d after germination. We measured spatial profiles of longitudinal velocity and cell length and calculated parameters of cell expansion and division, including rates of local cell production (cells mm−1 h−1) and cell division (cells cell−1 h−1). Data were obtained for the root cortex and also for the two types of epidermal cell, trichoblasts and atrichoblasts. Accelerating root elongation was caused by an increasingly longer growth zone, while maximal strain rates remained unchanged. The enlargement of the growth zone and, hence, the accelerating root elongation rate, were accompanied by a nearly proportionally increased cell production. This increased production was caused by increasingly numerous dividing cells, whereas their rates of division remained approximately constant. Additionally, the spatial profile of cell division rate was essentially constant. The meristem was longer than generally assumed, extending well into the region where cells elongated rapidly. In the two epidermal cell types, meristem length and cell division rate were both very similar to that of cortical cells, and differences in cell length between the two epidermal cell types originated at the apex of the meristem. These results highlight the importance of controlling the number of dividing cells, both to generate tissues with different cell lengths and to regulate the rate of organ enlargement.
Resumo:
I will discuss several issues related to the acceleration, collimation, and propagation of jets from active galactic nuclei. Hydromagnetic stresses provide the best bet for both accelerating relativistic flows and providing a certain amount of initial collimation. However, there are limits to how much "self-collimation" can be achieved without the help of an external pressurized medium. Moreover, existing models, which postulate highly organized poloidal flux near the base of the flow, are probably unrealistic. Instead, a large fraction of the magnetic energy may reside in highly disorganized "chaotic" fields. Such a field can also accelerate the flow to relativistic speeds, in some cases with greater efficiency than highly organized fields, but at the expense of self-collimation. The observational interpretation of jet physics is still hampered by a dearth of unambiguous diagnostics. Propagating disturbances in flows, such as the oblique shocks that may constitute the kiloparsec-scale "knots" in the M87 jet, may provide a wide range of untapped diagnostics for jet properties.
Resumo:
Self-organising neural models have the ability to provide a good representation of the input space. In particular the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time-consuming, especially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This paper proposes a Graphics Processing Unit (GPU) parallel implementation of the GNG with Compute Unified Device Architecture (CUDA). In contrast to existing algorithms, the proposed GPU implementation allows the acceleration of the learning process keeping a good quality of representation. Comparative experiments using iterative, parallel and hybrid implementations are carried out to demonstrate the effectiveness of CUDA implementation. The results show that GNG learning with the proposed implementation achieves a speed-up of 6× compared with the single-threaded CPU implementation. GPU implementation has also been applied to a real application with time constraints: acceleration of 3D scene reconstruction for egomotion, in order to validate the proposal.
Resumo:
A parallel algorithm for image noise removal is proposed. The algorithm is based on peer group concept and uses a fuzzy metric. An optimization study on the use of the CUDA platform to remove impulsive noise using this algorithm is presented. Moreover, an implementation of the algorithm on multi-core platforms using OpenMP is presented. Performance is evaluated in terms of execution time and a comparison of the implementation parallelised in multi-core, GPUs and the combination of both is conducted. A performance analysis with large images is conducted in order to identify the amount of pixels to allocate in the CPU and GPU. The observed time shows that both devices must have work to do, leaving the most to the GPU. Results show that parallel implementations of denoising filters on GPUs and multi-cores are very advisable, and they open the door to use such algorithms for real-time processing.
Resumo:
A parallel algorithm to remove impulsive noise in digital images using heterogeneous CPU/GPU computing is proposed. The parallel denoising algorithm is based on the peer group concept and uses an Euclidean metric. In order to identify the amount of pixels to be allocated in multi-core and GPUs, a performance analysis using large images is presented. A comparison of the parallel implementation in multi-core, GPUs and a combination of both is performed. Performance has been evaluated in terms of execution time and Megapixels/second. We present several optimization strategies especially effective for the multi-core environment, and demonstrate significant performance improvements. The main advantage of the proposed noise removal methodology is its computational speed, which enables efficient filtering of color images in real-time applications.
Resumo:
The requirements for edge protection systems on most sloped work surfaces (class C, according to EN 13374-2013 code) in construction works are studied in this paper. Maximum deceleration suffered by a falling body and maximum deflection of the protection system were analyzed through finite-element models and confirmed through full-scale experiments. The aim of this work is to determine which value for deflection system entails a safe deceleration for the human body. This value is compared with the requirements given by the current version of EN 13374-2013. An additional series of experiments were done to determine the acceleration linked to minimum deflection required by code (200 mm) during the retention process. According to the obtained results, a modification of this value is recommended. Additionally, a simple design formula for this falling protection system is proposed as a quick tool for the initial steps of design.