15 resultados para GPU acceleration

em Universidad Politécnica de Madrid


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we describe a complete development platform that features different innovative acceleration strategies, not included in any other current platform, that simplify and speed up the definition of the different elements required to design a spoken dialog service. The proposed accelerations are mainly based on using the information from the backend database schema and contents, as well as cumulative information produced throughout the different steps in the design. Thanks to these accelerations, the interaction between the designer and the platform is improved, and in most cases the design is reduced to simple confirmations of the “proposals” that the platform dynamically provides at each step. In addition, the platform provides several other accelerations such as configurable templates that can be used to define the different tasks in the service or the dialogs to obtain or show information to the user, automatic proposals for the best way to request slot contents from the user (i.e. using mixed-initiative forms or directed forms), an assistant that offers the set of more probable actions required to complete the definition of the different tasks in the application, or another assistant for solving specific modality details such as confirmations of user answers or how to present them the lists of retrieved results after querying the backend database. Additionally, the platform also allows the creation of speech grammars and prompts, database access functions, and the possibility of using mixed initiative and over-answering dialogs. In the paper we also describe in detail each assistant in the platform, emphasizing the different kind of methodologies followed to facilitate the design process at each one. Finally, we describe the results obtained in both a subjective and an objective evaluation with different designers that confirm the viability, usefulness, and functionality of the proposed accelerations. Thanks to the accelerations, the design time is reduced in more than 56% and the number of keystrokes by 84%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper outlines the problems found in the parallelization of SPH (Smoothed Particle Hydrodynamics) algorithms using Graphics Processing Units. Different results of some parallel GPU implementations in terms of the speed-up and the scalability compared to the CPU sequential codes are shown. The most problematic stage in the GPU-SPH algorithms is the one responsible for locating neighboring particles and building the vectors where this information is stored, since these specific algorithms raise many dificulties for a data-level parallelization. Because of the fact that the neighbor location using linked lists does not show enough data-level parallelism, two new approaches have been pro- posed to minimize bank conflicts in the writing and subsequent reading of the neighbor lists. The first strategy proposes an efficient coordination between CPU-GPU, using GPU algorithms for those stages that allow a straight forward parallelization, and sequential CPU algorithms for those instructions that involve some kind of vector reduction. This coordination provides a relatively orderly reading of the neighbor lists in the interactions stage, achieving a speed-up factor of x47 in this stage. However, since the construction of the neighbor lists is quite expensive, it is achieved an overall speed-up of x41. The second strategy seeks to maximize the use of the GPU in the neighbor's location process by executing a specific vector sorting algorithm that allows some data-level parallelism. Al- though this strategy has succeeded in improving the speed-up on the stage of neighboring location, the global speed-up on the interactions stage falls, due to inefficient reading of the neighbor vectors. Some changes to these strategies are proposed, aimed at maximizing the computational load of the GPU and using the GPU texture-units, in order to reach the maximum speed-up for such codes. Different practical applications have been added to the mentioned GPU codes. First, the classical dam-break problem is studied. Second, the wave impact of the sloshing fluid contained in LNG vessel tanks is also simulated as a practical example of particle methods

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The simulation of interest rate derivatives is a powerful tool to face the current market fluctuations. However, the complexity of the financial models and the way they are processed require exorbitant computation times, what is in clear conflict with the need of a processing time as short as possible to operate in the financial market. To shorten the computation time of financial derivatives the use of hardware accelerators becomes a must.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An analytical solution of the two body problem perturbed by a constant tangential acceleration is derived with the aid of perturbation theory. The solution, which is valid for circular and elliptic orbits with generic eccentricity, describes the instantaneous time variation of all orbital elements. A comparison with high-accuracy numerical results shows that the analytical method can be effectively applied to multiple-revolution low-thrust orbit transfer around planets and in interplanetary space with negligible error.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In laser-plasma experiments, we observed that ion acceleration from the Coulomb explosion of the plasma channel bored by the laser, is prevented when multiple plasma instabilities such as filamentation and hosing, and nonlinear coherent structures (vortices/post-solitons) appear in the wake of an ultrashort laser pulse. The tailoring of the longitudinal plasma density ramp allows us to control the onset of these insabilities. We deduced that the laser pulse is depleted into these structures in our conditions, when a plasma at about 10% of the critical density exhibits a gradient on the order of 250 {\mu}m (gaussian fit), thus hindering the acceleration. A promising experimental setup with a long pulse is demonstrated enabling the excitation of an isolated coherent structure for polarimetric measurements and, in further perspectives, parametric studies of ion plasma acceleration efficiency.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Este Proyecto Fin de Carrera (PFC) tiene como objetivo el análisis, diseño e implementación de un videojuego móvil multijugador, con un enfoque educativo, para la sensibilización sobre el Índice de Desarrollo Humano (IDH). El sistema resultante se ha desarrollado para la Plataforma Android, utilizando el Framework AndEngine, que utiliza aceleración hardware de la GPU para garantizar un buen rendimiento en terminales de gama baja, de modo que pueda utilizarse en un amplio número de terminales móviles disponibles en el mercado. La aplicación se presenta como un juego de cartas con los diferentes países y sus datos humanitarios, los jugadores deben conocer el peso de los índices de desarrollo (esperanza de vida, renta, educación) de los países en comparación con los países de los otros jugadores. El sistema de juego premia a los jugadores con mayores conocimientos sobre los datos humanos de los diferentes países del mundo, de ese modo los mejores jugadores serán los que tengan más conocimientos de estos datos. El juego permite jugar partidas en solitario utilizando jugadores manejados por la CPU, o multijugador mediante WIFI o 3G. La actualización de la información y de los datos de las partidas se realiza a través de la comunicación con un servidor web ya implementado de forma complementaria a la realización de este proyecto. El sistema ha sido integrado y validado satisfactoriamente con diferentes terminales móviles y usuarios de diferente perfil de edad y uso. El videojuego se puede descargar de la página web creada en un proyecto complementario a éste (pendiente de publicación web), y ya se encuentra también disponible en Google Play. https://play.google.com/store/apps/details?id=xnetcom.pro.cartas&hl=es_419 ABSTRACT. This Project End of Career (PFC) takes as an aim the analysis, design and implementation of a multiplayer mobile videogame, with an educational approach, for the awareness on the Human Development Index (HDI). The resultant system has been developed for the Platform Android, using the AndEngine Framework, which uses hardware acceleration of the GPU to ensure a good performance on low-end terminals, so that it can be used in a wide range of mobile handsets available in the market. The application is presented as a card game with the different countries and his humanitarian information, the players must know the weight of the indexes of development (life expectancy, revenue, education) of the countries in comparison with the countries of other players. The game system rewards players with more knowledge on human information of different countries, thus the best players will be those with more knowledge of these information. The game allows to play items in solitarily using players handled by the CPU, or multiplayer by means of WIFI or 3G. The update of the information and data of the online games is done through communication with a web server implemented as a complement to the realization of this project. The system has been built and successfully validated with different mobile terminals and users of different age and usage profile. The game can be downloaded from the website created in a complementary project to this (web publication pending), and is now also available on Google Play https://play.google.com/store/apps/details?id=xnetcom.pro.cartas&hl=es_419

Relevância:

20.00% 20.00%

Publicador:

Resumo:

En el presente documento se hablará acerca del desarrollo de un proyecto para la mejora de un programa de análisis de señales; con ese fin, se hará uso de técnicas de optimización del software y de tecnologías de aceleración, mediante el aprovechamiento del paralelismo del programa. Además se hará un análisis de acerca del uso de dos tecnologías basadas en diferentes paradigmas de programación paralela; una mediante múltiples hilos con memoria compartida y la otra mediante el uso de GPUs como dispositivos de coprocesamiento. This paper will talk about the development of a Project to improve a program that does signals analysis; to that end, it will make use of software optimization techniques and acceleration technologies by exploiting parallelism in the program. In Addition will be done an analysis on the use of two technologies based on two different paradigms; one using multiple threads with shared memory and the other using GPU as co-processing devices.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report on the ion acceleration mechanisms that occur during the interaction of an intense and ultrashort laser pulse ( λ > μ I 2 1018 W cm−2 m2) with an underdense helium plasma produced from an ionized gas jet target. In this unexplored regime, where the laser pulse duration is comparable to the inverse of the electron plasma frequency ωpe, reproducible non-thermal ion bunches have been measured in the radial direction. The two He ion charge states present energy distributions with cutoff energies between 150 and 200 keV, and a striking energy gap around 50 keV appearing consistently for all the shots in a given density range. Fully electromagnetic particle-in-cell simulations explain the experimental behaviors. The acceleration results from a combination of target normal sheath acceleration and Coulomb explosion of a filament formed around the laser pulse propagation axis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel GPU-based nonparametric moving object detection strategy for computer vision tools requiring real-time processing is proposed. An alternative and efficient Bayesian classifier to combine nonparametric background and foreground models allows increasing correct detections while avoiding false detections. Additionally, an efficient region of interest analysis significantly reduces the computational cost of the detections.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this article, an approximate analytical solution for the two body problem perturbed by a radial, low acceleration is obtained, using a regularized formulation of the orbital motion and the method of multiple scales. The results reveal that the physics of the problem evolve in two fundamental scales of the true anomaly. The first one drives the oscillations of the orbital parameters along each orbit. The second one is responsible of the long-term variations in the amplitude and mean values of these oscillations. A good agreement is found with high precision numerical solutions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper the influence of an axial microgravity on the minimum volume stability limit of axisymmetric liquid bridges between unequal disks is analyzed both theoretically and experimentally. The results here presented extend the knowledge of the static behaviour of liquid bridges to fluid configurations different from those studied up to now (almost equal disks). Experimental results, obtained by simulating microgravity conditions by the neutral buoyancy technique, are also presented and are shown to be in complete agreement with theoretical ones.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many computer vision and human-computer interaction applications developed in recent years need evaluating complex and continuous mathematical functions as an essential step toward proper operation. However, rigorous evaluation of this kind of functions often implies a very high computational cost, unacceptable in real-time applications. To alleviate this problem, functions are commonly approximated by simpler piecewise-polynomial representations. Following this idea, we propose a novel, efficient, and practical technique to evaluate complex and continuous functions using a nearly optimal design of two types of piecewise linear approximations in the case of a large budget of evaluation subintervals. To this end, we develop a thorough error analysis that yields asymptotically tight bounds to accurately quantify the approximation performance of both representations. It provides an improvement upon previous error estimates and allows the user to control the trade-off between the approximation error and the number of evaluation subintervals. To guarantee real-time operation, the method is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), where it outperforms previous alternative approaches by exploiting the fixed-function interpolation routines present in their texture units. The proposed technique is a perfect match for any application requiring the evaluation of continuous functions, we have measured in detail its quality and efficiency on several functions, and, in particular, the Gaussian function because it is extensively used in many areas of computer vision and cybernetics, and it is expensive to evaluate.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The stability limit of minimum volume and the breaking dynamics of liquid bridges between nonequal, noncoaxial, circular supporting disks subject to a lateral acceleration were experimentally analyzed by working with liquid bridges of very small dimensions. Experimental results are compared with asymptotic theoretical predictions, with the agreement between experimental results and asymptotic ones being satisfactory

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work is addressed the topic of estimation of velocity and acceleration from digital position data. It is presented a review of several classic methods and implemented with real position data from a low cost digital sensor of a hydraulic linear actuator. The results are analyzed and compared. It is shown that static methods have a limited bandwidth application, and that the performance of some methods may be enhanced by adapting its parameters according to the current state.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Debido al creciente aumento del tamaño de los datos en muchos de los actuales sistemas de información, muchos de los algoritmos de recorrido de estas estructuras pierden rendimento para realizar búsquedas en estos. Debido a que la representacion de estos datos en muchos casos se realiza mediante estructuras nodo-vertice (Grafos), en el año 2009 se creó el reto Graph500. Con anterioridad, otros retos como Top500 servían para medir el rendimiento en base a la capacidad de cálculo de los sistemas, mediante tests LINPACK. En caso de Graph500 la medicion se realiza mediante la ejecución de un algoritmo de recorrido en anchura de grafos (BFS en inglés) aplicada a Grafos. El algoritmo BFS es uno de los pilares de otros muchos algoritmos utilizados en grafos como SSSP, shortest path o Betweeness centrality. Una mejora en este ayudaría a la mejora de los otros que lo utilizan. Analisis del Problema El algoritmos BFS utilizado en los sistemas de computación de alto rendimiento (HPC en ingles) es usualmente una version para sistemas distribuidos del algoritmo secuencial original. En esta versión distribuida se inicia la ejecución realizando un particionado del grafo y posteriormente cada uno de los procesadores distribuidos computará una parte y distribuirá sus resultados a los demás sistemas. Debido a que la diferencia de velocidad entre el procesamiento en cada uno de estos nodos y la transfencia de datos por la red de interconexión es muy alta (estando en desventaja la red de interconexion) han sido bastantes las aproximaciones tomadas para reducir la perdida de rendimiento al realizar transferencias. Respecto al particionado inicial del grafo, el enfoque tradicional (llamado 1D-partitioned graph en ingles) consiste en asignar a cada nodo unos vertices fijos que él procesará. Para disminuir el tráfico de datos se propuso otro particionado (2D) en el cual la distribución se haciá en base a las aristas del grafo, en vez de a los vertices. Este particionado reducía el trafico en la red en una proporcion O(NxM) a O(log(N)). Si bien han habido otros enfoques para reducir la transferecnia como: reordemaniento inicial de los vertices para añadir localidad en los nodos, o particionados dinámicos, el enfoque que se va a proponer en este trabajo va a consistir en aplicar técnicas recientes de compression de grandes sistemas de datos como Bases de datos de alto volume o motores de búsqueda en internet para comprimir los datos de las transferencias entre nodos.---ABSTRACT---The Breadth First Search (BFS) algorithm is the foundation and building block of many higher graph-based operations such as spanning trees, shortest paths and betweenness centrality. The importance of this algorithm increases each day due to it is a key requirement for many data structures which are becoming popular nowadays. These data structures turn out to be internally graph structures. When the BFS algorithm is parallelized and the data is distributed into several processors, some research shows a performance limitation introduced by the interconnection network [31]. Hence, improvements on the area of communications may benefit the global performance in this key algorithm. In this work it is presented an alternative compression mechanism. It differs with current existing methods in that it is aware of characteristics of the data which may benefit the compression. Apart from this, we will perform a other test to see how this algorithm (in a dis- tributed scenario) benefits from traditional instruction-based optimizations. Last, we will review the current supercomputing techniques and the related work being done in the area.