982 resultados para Hardware Acceleration


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The evolution of the electronics embedded applications forces electronics systems designers to match their ever increasing requirements. This evolution pushes the computational power of digital signal processing systems, as well as the energy required to accomplish the computations, due to the increasing mobility of such applications. Current approaches used to match these requirements relies on the adoption of application specific signal processors. Such kind of devices exploits powerful accelerators, which are able to match both performance and energy requirements. On the other hand, the too high specificity of such accelerators often results in a lack of flexibility which affects non-recurrent engineering costs, time to market, and market volumes too. The state of the art mainly proposes two solutions to overcome these issues with the ambition of delivering reasonable performance and energy efficiency: reconfigurable computing and multi-processors computing. All of these solutions benefits from the post-fabrication programmability, that definitively results in an increased flexibility. Nevertheless, the gap between these approaches and dedicated hardware is still too high for many application domains, especially when targeting the mobile world. In this scenario, flexible and energy efficient acceleration can be achieved by merging these two computational paradigms, in order to address all the above introduced constraints. This thesis focuses on the exploration of the design and application spectrum of reconfigurable computing, exploited as application specific accelerators for multi-processors systems on chip. More specifically, it introduces a reconfigurable digital signal processor featuring a heterogeneous set of reconfigurable engines, and a homogeneous multi-core system, exploiting three different flavours of reconfigurable and mask-programmable technologies as implementation platform for applications specific accelerators. In this work, the various trade-offs concerning the utilization multi-core platforms and the different configuration technologies are explored, characterizing the design space of the proposed approach in terms of programmability, performance, energy efficiency and manufacturing costs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today, modern System-on-a-Chip (SoC) systems have grown rapidly due to the increased processing power, while maintaining the size of the hardware circuit. The number of transistors on a chip continues to increase, but current SoC designs may not be able to exploit the potential performance, especially with energy consumption and chip area becoming two major concerns. Traditional SoC designs usually separate software and hardware. Thus, the process of improving the system performance is a complicated task for both software and hardware designers. The aim of this research is to develop hardware acceleration workflow for software applications. Thus, system performance can be improved with constraints of energy consumption and on-chip resource costs. The characteristics of software applications can be identified by using profiling tools. Hardware acceleration can have significant performance improvement for highly mathematical calculations or repeated functions. The performance of SoC systems can then be improved, if the hardware acceleration method is used to accelerate the element that incurs performance overheads. The concepts mentioned in this study can be easily applied to a variety of sophisticated software applications. The contributions of SoC-based hardware acceleration in the hardware-software co-design platform include the following: (1) Software profiling methods are applied to H.264 Coder-Decoder (CODEC) core. The hotspot function of aimed application is identified by using critical attributes such as cycles per loop, loop rounds, etc. (2) Hardware acceleration method based on Field-Programmable Gate Array (FPGA) is used to resolve system bottlenecks and improve system performance. The identified hotspot function is then converted to a hardware accelerator and mapped onto the hardware platform. Two types of hardware acceleration methods – central bus design and co-processor design, are implemented for comparison in the proposed architecture. (3) System specifications, such as performance, energy consumption, and resource costs, are measured and analyzed. The trade-off of these three factors is compared and balanced. Different hardware accelerators are implemented and evaluated based on system requirements. 4) The system verification platform is designed based on Integrated Circuit (IC) workflow. Hardware optimization techniques are used for higher performance and less resource costs. Experimental results show that the proposed hardware acceleration workflow for software applications is an efficient technique. The system can reach 2.8X performance improvements and save 31.84% energy consumption by applying the Bus-IP design. The Co-processor design can have 7.9X performance and save 75.85% energy consumption.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reconfigurable platforms are a promising technology that offers an interesting trade-off between flexibility and performance, which many recent embedded system applications demand, especially in fields such as multimedia processing. These applications typically involve multiple ad-hoc tasks for hardware acceleration, which are usually represented using formalisms such as Data Flow Diagrams (DFDs), Data Flow Graphs (DFGs), Control and Data Flow Graphs (CDFGs) or Petri Nets. However, none of these models is able to capture at the same time the pipeline behavior between tasks (that therefore can coexist in order to minimize the application execution time), their communication patterns, and their data dependencies. This paper proves that the knowledge of all this information can be effectively exploited to reduce the resource requirements and the timing performance of modern reconfigurable systems, where a set of hardware accelerators is used to support the computation. For this purpose, this paper proposes a novel task representation model, named Temporal Constrained Data Flow Diagram (TCDFD), which includes all this information. This paper also presents a mapping-scheduling algorithm that is able to take advantage of the new TCDFD model. It aims at minimizing the dynamic reconfiguration overhead while meeting the communication requirements among the tasks. Experimental results show that the presented approach achieves up to 75% of resources saving and up to 89% of reconfiguration overhead reduction with respect to other state-of-the-art techniques for reconfigurable platforms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today, modern System-on-a-Chip (SoC) systems have grown rapidly due to the increased processing power, while maintaining the size of the hardware circuit. The number of transistors on a chip continues to increase, but current SoC designs may not be able to exploit the potential performance, especially with energy consumption and chip area becoming two major concerns. Traditional SoC designs usually separate software and hardware. Thus, the process of improving the system performance is a complicated task for both software and hardware designers. The aim of this research is to develop hardware acceleration workflow for software applications. Thus, system performance can be improved with constraints of energy consumption and on-chip resource costs. The characteristics of software applications can be identified by using profiling tools. Hardware acceleration can have significant performance improvement for highly mathematical calculations or repeated functions. The performance of SoC systems can then be improved, if the hardware acceleration method is used to accelerate the element that incurs performance overheads. The concepts mentioned in this study can be easily applied to a variety of sophisticated software applications. The contributions of SoC-based hardware acceleration in the hardware-software co-design platform include the following: (1) Software profiling methods are applied to H.264 Coder-Decoder (CODEC) core. The hotspot function of aimed application is identified by using critical attributes such as cycles per loop, loop rounds, etc. (2) Hardware acceleration method based on Field-Programmable Gate Array (FPGA) is used to resolve system bottlenecks and improve system performance. The identified hotspot function is then converted to a hardware accelerator and mapped onto the hardware platform. Two types of hardware acceleration methods – central bus design and co-processor design, are implemented for comparison in the proposed architecture. (3) System specifications, such as performance, energy consumption, and resource costs, are measured and analyzed. The trade-off of these three factors is compared and balanced. Different hardware accelerators are implemented and evaluated based on system requirements. 4) The system verification platform is designed based on Integrated Circuit (IC) workflow. Hardware optimization techniques are used for higher performance and less resource costs. Experimental results show that the proposed hardware acceleration workflow for software applications is an efficient technique. The system can reach 2.8X performance improvements and save 31.84% energy consumption by applying the Bus-IP design. The Co-processor design can have 7.9X performance and save 75.85% energy consumption.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Many industrial applications need object recognition and tracking capabilities. The algorithms developed for those purposes are computationally expensive. Yet ,real time performance, high accuracy and small power consumption are essential measures of the system. When all these requirements are combined, hardware acceleration of these algorithms becomes a feasible solution. The purpose of this study is to analyze the current state of these hardware acceleration solutions, which algorithms have been implemented in hardware and what modifications have been done in order to adapt these algorithms to hardware.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Trabalho de Projeto para obtenção do grau de Mestre em Engenharia de Eletrónica e Telecomunicações

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tese de Doutoramento Plano Doutoral em Engenharia Eletrónica e de Computadores.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis explores system performance for reconfigurable distributed systems and provides an analytical model for determining throughput of theoretical systems based on the OpenSPARC FPGA Board and the SIRC Communication Framework. This model was developed by studying a small set of variables that together determine a system¿s throughput. The importance of this model is in assisting system designers to make decisions as to whether or not to commit to designing a reconfigurable distributed system based on the estimated performance and hardware costs. Because custom hardware design and distributed system design are both time consuming and costly, it is important for designers to make decisions regarding system feasibility early in the development cycle. Based on experimental data the model presented in this paper shows a close fit with less than 10% experimental error on average. The model is limited to a certain range of problems, but it can still be used given those limitations and also provides a foundation for further development of modeling reconfigurable distributed systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mersenne Twister (MT) uniform random number generators are key cores for hardware acceleration of Monte Carlo simulations. In this work, two different architectures are studied: besides the classical table-based architecture, a different architecture based on a circular buffer and especially targeting FPGAs is proposed. A 30% performance improvement has been obtained when compared to the fastest previous work. The applicability of the proposed MT architectures has been proven in a high performance Gaussian RNG.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Este Proyecto Fin de Carrera (PFC) tiene como objetivo el análisis, diseño e implementación de un videojuego móvil multijugador, con un enfoque educativo, para la sensibilización sobre el Índice de Desarrollo Humano (IDH). El sistema resultante se ha desarrollado para la Plataforma Android, utilizando el Framework AndEngine, que utiliza aceleración hardware de la GPU para garantizar un buen rendimiento en terminales de gama baja, de modo que pueda utilizarse en un amplio número de terminales móviles disponibles en el mercado. La aplicación se presenta como un juego de cartas con los diferentes países y sus datos humanitarios, los jugadores deben conocer el peso de los índices de desarrollo (esperanza de vida, renta, educación) de los países en comparación con los países de los otros jugadores. El sistema de juego premia a los jugadores con mayores conocimientos sobre los datos humanos de los diferentes países del mundo, de ese modo los mejores jugadores serán los que tengan más conocimientos de estos datos. El juego permite jugar partidas en solitario utilizando jugadores manejados por la CPU, o multijugador mediante WIFI o 3G. La actualización de la información y de los datos de las partidas se realiza a través de la comunicación con un servidor web ya implementado de forma complementaria a la realización de este proyecto. El sistema ha sido integrado y validado satisfactoriamente con diferentes terminales móviles y usuarios de diferente perfil de edad y uso. El videojuego se puede descargar de la página web creada en un proyecto complementario a éste (pendiente de publicación web), y ya se encuentra también disponible en Google Play. https://play.google.com/store/apps/details?id=xnetcom.pro.cartas&hl=es_419 ABSTRACT. This Project End of Career (PFC) takes as an aim the analysis, design and implementation of a multiplayer mobile videogame, with an educational approach, for the awareness on the Human Development Index (HDI). The resultant system has been developed for the Platform Android, using the AndEngine Framework, which uses hardware acceleration of the GPU to ensure a good performance on low-end terminals, so that it can be used in a wide range of mobile handsets available in the market. The application is presented as a card game with the different countries and his humanitarian information, the players must know the weight of the indexes of development (life expectancy, revenue, education) of the countries in comparison with the countries of other players. The game system rewards players with more knowledge on human information of different countries, thus the best players will be those with more knowledge of these information. The game allows to play items in solitarily using players handled by the CPU, or multiplayer by means of WIFI or 3G. The update of the information and data of the online games is done through communication with a web server implemented as a complement to the realization of this project. The system has been built and successfully validated with different mobile terminals and users of different age and usage profile. The game can be downloaded from the website created in a complementary project to this (web publication pending), and is now also available on Google Play https://play.google.com/store/apps/details?id=xnetcom.pro.cartas&hl=es_419

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En los últimos años el número de dispositivos móviles y smartphones ha aumentado drásticamente, así como el número de aplicaciones destinadas a estos. Los desarrolladores siempre se han visto frenados en la creación de estas aplicaciones debido a la complejidad que supone la diversidad de sistemas operativos (Android, iOS, Windows Phone, etc), que utilizan lenguajes de programación diferentes, haciendo que, para poder desarrollar una aplicación que funcione en estas plataformas, en verdad haya que implementar una aplicación independiente para cada una de las plataformas. Para solucionar este problema han surgido frameworks, como Appcelerator Titanium, que permiten escribir una sola vez la aplicación y compilarla para las diferentes plataformas móviles objetivo. Sin embargo, estos frameworks están aún en estado muy temprano de desarrollo, por lo que no resuelven toda la problemática ni dan una respuesta completa a los desarrolladores. El objetivo de este Trabajo de Fin de Grado ha sido contribuir a la evolución de estos frameworks mediante la creación de un módulo para Appcelerator Titanium que permita construir de manera ágil aplicaciones multiplataforma que hagan uso de visualizadores de información geográfica. Para ello se propone el desarrollo de un módulo de mapa con soporte para capas WMS, rutas y polígonos en WKT, KML y GeoJSON. Se facilitará además que estas aplicaciones puedan acceder a capacidades del hardware como la brújula y el GPS para realizar un seguimiento de la localización, a la vez que se hace uso de la aceleración por el hardware subyacente para mejorar la velocidad y fluidez de la información visualizada en el mapa. A partir de este módulo se ha creado una aplicación que hace uso de todas sus características y posteriormente se ha migrado a la plataforma Wirecloud4Tablet como componente nativo que puede integrarse con otros componentes web (widgets) mediante técnicas de mashup. Gracias a esto se ha podido fusionar por un lado todas las ventajas que ofrece Wirecloud para el rápido desarrollo de aplicaciones sin necesidad de tener conocimientos de programación, junto con las ventajas que ofrecen las aplicaciones nativas en cuanto a rendimiento y características extras. Usando los resultados de este proyecto, se pueden crear de manera ágil aplicaciones composicionales nativas multiplataforma que hagan uso de visualización de información geográfica; es decir, se pueden crear aplicaciones en pocos minutos y sin conocimientos de programación que pueden ejecutar diferentes componentes (como el mapa) de manera nativa en múltiples plataformas. Se facilita también la integración de componentes nativos (como es el mapa desarrollado) con otros componentes web (widgets) en un mashup que puede visualizarse en dispositivos móviles mediante la plataforma Wirecloud. ---ABSTRACT---In recent years the number of mobile devices and smartphones has increased dramatically as well as the number of applications targeted at them. Developers always have been slowed in the creation of these applications due to the complexity caused by the diversity of operating systems (Android, iOS, Windows Phone, etc), each of them using different programming languages, so that, in order to develop an application that works on these platforms, the developer really has to implement a different application for each platform. To solve this problem frameworks such as Appcelerator Titanium have emerged, allowing developers to write the application once and to compile it for different target mobile platforms. However, these frameworks are still in very early stage of development, so they do not solve all the difficulties nor give a complete solution to the developers. The objective of this final year dissertation is to contribute to the evolution of these frameworks by creating a module for Appcelerator Titanium that permits to nimbly build multi-platform applications that make use of geographical information visualization. To this end, the development of a map module with support for WMS layers, paths, and polygons in WKT, KML, and GeoJSON is proposed. This module will also facilitate these applications to access hardware capabilities such as GPS and compass to track the location, while it makes use of the underlying hardware acceleration to improve the speed and fluidity of the information displayed on the map. Based on this module, it has been created an application that makes use of all its features and subsequently it has been migrated to the platform Wirecloud4Tablet as a native component that can be integrated with other web components (widgets) using mashup techniques. As a result, it has been fused on one side all the advantages Wirecloud provides for fast application development without the need of programming skills, along with the advantages of native apps, such as performance and extra features. Using the results of this project, compositional platform native applications that make use of geographical information visualization can be created in an agile way; ie, in a few minutes and without having programming skills, a developer could create applications that can run different components (like the map) natively on multiple platforms. It also facilitates the integration of native components (like the map) with other web components (widgets) in a mashup that can be displayed on mobile devices through the Wirecloud platform.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tool path generation is one of the most complex problems in Computer Aided Manufacturing. Although some efficient strategies have been developed, most of them are only useful for standard machining. However, the algorithms used for tool path computation demand a higher computation performance, which makes the implementation on many existing systems very slow or even impractical. Hardware acceleration is an incremental solution that can be cleanly added to these systems while keeping everything else intact. It is completely transparent to the user. The cost is much lower and the development time is much shorter than replacing the computers by faster ones. This paper presents an optimisation that uses a specific graphic hardware approach using the power of multi-core Graphic Processing Units (GPUs) in order to improve the tool path computation. This improvement is applied on a highly accurate and robust tool path generation algorithm. The paper presents, as a case of study, a fully implemented algorithm used for turning lathe machining of shoe lasts. A comparative study will show the gain achieved in terms of total computing time. The execution time is almost two orders of magnitude faster than modern PCs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A major impediment to developing real-time computer vision systems has been the computational power and level of skill required to process video streams in real-time. This has meant that many researchers have either analysed video streams off-line or used expensive dedicated hardware acceleration techniques. Recent software and hardware developments have greatly eased the development burden of realtime image analysis leading to the development of portable systems using cheap PC hardware and software exploiting the Multimedia Extension (MMX) instruction set of the Intel Pentium chip. This paper describes the implementation of a computationally efficient computer vision system for recognizing hand gestures using efficient coding and MMX-acceleration to achieve real-time performance on low cost hardware.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

El avance en la potencia de cómputo en nuestros días viene dado por la paralelización del procesamiento, dadas las características que disponen las nuevas arquitecturas de hardware. Utilizar convenientemente este hardware impacta en la aceleración de los algoritmos en ejecución (programas). Sin embargo, convertir de forma adecuada el algoritmo en su forma paralela es complejo, y a su vez, esta forma, es específica para cada tipo de hardware paralelo. En la actualidad los procesadores de uso general más comunes son los multicore, procesadores paralelos, también denominados Symmetric Multi-Processors (SMP). Hoy en día es difícil hallar un procesador para computadoras de escritorio que no tengan algún tipo de paralelismo del caracterizado por los SMP, siendo la tendencia de desarrollo, que cada día nos encontremos con procesadores con mayor numero de cores disponibles. Por otro lado, los dispositivos de procesamiento de video (Graphics Processor Units - GPU), a su vez, han ido desarrollando su potencia de cómputo por medio de disponer de múltiples unidades de procesamiento dentro de su composición electrónica, a tal punto que en la actualidad no es difícil encontrar placas de GPU con capacidad de 200 a 400 hilos de procesamiento paralelo. Estos procesadores son muy veloces y específicos para la tarea que fueron desarrollados, principalmente el procesamiento de video. Sin embargo, como este tipo de procesadores tiene muchos puntos en común con el procesamiento científico, estos dispositivos han ido reorientándose con el nombre de General Processing Graphics Processor Unit (GPGPU). A diferencia de los procesadores SMP señalados anteriormente, las GPGPU no son de propósito general y tienen sus complicaciones para uso general debido al límite en la cantidad de memoria que cada placa puede disponer y al tipo de procesamiento paralelo que debe realizar para poder ser productiva su utilización. Los dispositivos de lógica programable, FPGA, son dispositivos capaces de realizar grandes cantidades de operaciones en paralelo, por lo que pueden ser usados para la implementación de algoritmos específicos, aprovechando el paralelismo que estas ofrecen. Su inconveniente viene derivado de la complejidad para la programación y el testing del algoritmo instanciado en el dispositivo. Ante esta diversidad de procesadores paralelos, el objetivo de nuestro trabajo está enfocado en analizar las características especificas que cada uno de estos tienen, y su impacto en la estructura de los algoritmos para que su utilización pueda obtener rendimientos de procesamiento acordes al número de recursos utilizados y combinarlos de forma tal que su complementación sea benéfica. Específicamente, partiendo desde las características del hardware, determinar las propiedades que el algoritmo paralelo debe tener para poder ser acelerado. Las características de los algoritmos paralelos determinará a su vez cuál de estos nuevos tipos de hardware son los mas adecuados para su instanciación. En particular serán tenidos en cuenta el nivel de dependencia de datos, la necesidad de realizar sincronizaciones durante el procesamiento paralelo, el tamaño de datos a procesar y la complejidad de la programación paralela en cada tipo de hardware. Today´s advances in high-performance computing are driven by parallel processing capabilities of available hardware architectures. These architectures enable the acceleration of algorithms when thes ealgorithms are properly parallelized and exploit the specific processing power of the underneath architecture. Most current processors are targeted for general pruposes and integrate several processor cores on a single chip, resulting in what is known as a Symmetric Multiprocessing (SMP) unit. Nowadays even desktop computers make use of multicore processors. Meanwhile, the industry trend is to increase the number of integrated rocessor cores as technology matures. On the other hand, Graphics Processor Units (GPU), originally designed to handle only video processing, have emerged as interesting alternatives to implement algorithm acceleration. Current available GPUs are able to implement from 200 to 400 threads for parallel processing. Scientific computing can be implemented in these hardware thanks to the programability of new GPUs that have been denoted as General Processing Graphics Processor Units (GPGPU).However, GPGPU offer little memory with respect to that available for general-prupose processors; thus, the implementation of algorithms need to be addressed carefully. Finally, Field Programmable Gate Arrays (FPGA) are programmable devices which can implement hardware logic with low latency, high parallelism and deep pipelines. Thes devices can be used to implement specific algorithms that need to run at very high speeds. However, their programmability is harder that software approaches and debugging is typically time-consuming. In this context where several alternatives for speeding up algorithms are available, our work aims at determining the main features of thes architectures and developing the required know-how to accelerate algorithm execution on them. We look at identifying those algorithms that may fit better on a given architecture as well as compleme

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: To evaluate the effects of recent advances in magnetic resonance imaging (MRI) radiofrequency (RF) coil and parallel imaging technology on brain volume measurement consistency. MATERIALS AND METHODS: In all, 103 whole-brain MRI volumes were acquired at a clinical 3T MRI, equipped with a 12- and 32-channel head coil, using the T1-weighted protocol as employed in the Alzheimer's Disease Neuroimaging Initiative study with parallel imaging accelerations ranging from 1 to 5. An experienced reader performed qualitative ratings of the images. For quantitative analysis, differences in composite width (CW, a measure of image similarity) and boundary shift integral (BSI, a measure of whole-brain atrophy) were calculated. RESULTS: Intra- and intersession comparisons of CW and BSI measures from scans with equal acceleration demonstrated excellent scan-rescan accuracy, even at the highest acceleration applied. Pairs-of-scans acquired with different accelerations exhibited poor scan-rescan consistency only when differences in the acceleration factor were maximized. A change in the coil hardware between compared scans was found to bias the BSI measure. CONCLUSION: The most important findings are that the accelerated acquisitions appear to be compatible with the assessment of high-quality quantitative information and that for highest scan-rescan accuracy in serial scans the acquisition protocol should be kept as consistent as possible over time. J. Magn. Reson. Imaging 2012;36:1234-1240. ©2012 Wiley Periodicals, Inc.