686 resultados para Processors
Resumo:
Debido al creciente aumento del tamaño de los datos en muchos de los actuales sistemas de información, muchos de los algoritmos de recorrido de estas estructuras pierden rendimento para realizar búsquedas en estos. Debido a que la representacion de estos datos en muchos casos se realiza mediante estructuras nodo-vertice (Grafos), en el año 2009 se creó el reto Graph500. Con anterioridad, otros retos como Top500 servían para medir el rendimiento en base a la capacidad de cálculo de los sistemas, mediante tests LINPACK. En caso de Graph500 la medicion se realiza mediante la ejecución de un algoritmo de recorrido en anchura de grafos (BFS en inglés) aplicada a Grafos. El algoritmo BFS es uno de los pilares de otros muchos algoritmos utilizados en grafos como SSSP, shortest path o Betweeness centrality. Una mejora en este ayudaría a la mejora de los otros que lo utilizan. Analisis del Problema El algoritmos BFS utilizado en los sistemas de computación de alto rendimiento (HPC en ingles) es usualmente una version para sistemas distribuidos del algoritmo secuencial original. En esta versión distribuida se inicia la ejecución realizando un particionado del grafo y posteriormente cada uno de los procesadores distribuidos computará una parte y distribuirá sus resultados a los demás sistemas. Debido a que la diferencia de velocidad entre el procesamiento en cada uno de estos nodos y la transfencia de datos por la red de interconexión es muy alta (estando en desventaja la red de interconexion) han sido bastantes las aproximaciones tomadas para reducir la perdida de rendimiento al realizar transferencias. Respecto al particionado inicial del grafo, el enfoque tradicional (llamado 1D-partitioned graph en ingles) consiste en asignar a cada nodo unos vertices fijos que él procesará. Para disminuir el tráfico de datos se propuso otro particionado (2D) en el cual la distribución se haciá en base a las aristas del grafo, en vez de a los vertices. Este particionado reducía el trafico en la red en una proporcion O(NxM) a O(log(N)). Si bien han habido otros enfoques para reducir la transferecnia como: reordemaniento inicial de los vertices para añadir localidad en los nodos, o particionados dinámicos, el enfoque que se va a proponer en este trabajo va a consistir en aplicar técnicas recientes de compression de grandes sistemas de datos como Bases de datos de alto volume o motores de búsqueda en internet para comprimir los datos de las transferencias entre nodos.---ABSTRACT---The Breadth First Search (BFS) algorithm is the foundation and building block of many higher graph-based operations such as spanning trees, shortest paths and betweenness centrality. The importance of this algorithm increases each day due to it is a key requirement for many data structures which are becoming popular nowadays. These data structures turn out to be internally graph structures. When the BFS algorithm is parallelized and the data is distributed into several processors, some research shows a performance limitation introduced by the interconnection network [31]. Hence, improvements on the area of communications may benefit the global performance in this key algorithm. In this work it is presented an alternative compression mechanism. It differs with current existing methods in that it is aware of characteristics of the data which may benefit the compression. Apart from this, we will perform a other test to see how this algorithm (in a dis- tributed scenario) benefits from traditional instruction-based optimizations. Last, we will review the current supercomputing techniques and the related work being done in the area.
Resumo:
A minimal hypothesis is proposed concerning the brain processes underlying effortful tasks. It distinguishes two main computational spaces: a unique global workspace composed of distributed and heavily interconnected neurons with long-range axons, and a set of specialized and modular perceptual, motor, memory, evaluative, and attentional processors. Workspace neurons are mobilized in effortful tasks for which the specialized processors do not suffice. They selectively mobilize or suppress, through descending connections, the contribution of specific processor neurons. In the course of task performance, workspace neurons become spontaneously coactivated, forming discrete though variable spatio-temporal patterns subject to modulation by vigilance signals and to selection by reward signals. A computer simulation of the Stroop task shows workspace activation to increase during acquisition of a novel task, effortful execution, and after errors. We outline predictions for spatio-temporal activation patterns during brain imaging, particularly about the contribution of dorsolateral prefrontal cortex and anterior cingulate to the workspace.
Resumo:
This paper describes the state of the art in applications of voice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed. Several issues concerning evaluation of speech recognition/synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions.
Resumo:
We describe Janus, a massively parallel FPGA-based computer optimized for the simulation of spin glasses, theoretical models for the behavior of glassy materials. FPGAs (as compared to GPUs or many-core processors) provide a complementary approach to massively parallel computing. In particular, our model problem is formulated in terms of binary variables, and floating-point operations can be (almost) completely avoided. The FPGA architecture allows us to run many independent threads with almost no latencies in memory access, thus updating up to 1024 spins per cycle. We describe Janus in detail and we summarize the physics results obtained in four years of operation of this machine; we discuss two types of physics applications: long simulations on very large systems (which try to mimic and provide understanding about the experimental non equilibrium dynamics), and low-temperature equilibrium simulations using an artificial parallel tempering dynamics. The time scale of our non-equilibrium simulations spans eleven orders of magnitude (from picoseconds to a tenth of a second). On the other hand, our equilibrium simulations are unprecedented both because of the low temperatures reached and for the large systems that we have brought to equilibrium. A finite-time scaling ansatz emerges from the detailed comparison of the two sets of simulations. Janus has made it possible to perform spin glass simulations that would take several decades on more conventional architectures. The paper ends with an assessment of the potential of possible future versions of the Janus architecture, based on state-of-the-art technology.
Resumo:
This paper describes JANUS, a modular massively parallel and reconfigurable FPGA-based computing system. Each JANUS module has a computational core and a host. The computational core is a 4x4 array of FPGA-based processing elements with nearest-neighbor data links. Processors are also directly connected to an I/O node attached to the JANUS host, a conventional PC. JANUS is tailored for, but not limited to, the requirements of a class of hard scientific applications characterized by regular code structure, unconventional data manipulation instructions and not too large data-base size. We discuss the architecture of this configurable machine, and focus on its use on Monte Carlo simulations of statistical mechanics. On this class of application JANUS achieves impressive performances: in some cases one JANUS processing element outperfoms high-end PCs by a factor ≈1000. We also discuss the role of JANUS on other classes of scientific applications.
Resumo:
O trabalho trata do projeto e do desenvolvimento de um processador de baixo consumo de potência, de forma simplificada, explorando técnicas de microarquitetura, para atingir menor consumo de potência. É apresentada uma sequência lógica de desenvolvimento, a partir de conceitos e estruturas básicas, até chegar a estruturas mais complexas e, por fim, mostrar a microarquitetura completa do processador. Esse novo modelo de processador é comparado com estudos prévios de três processadores, sendo o primeiro modelo síncrono, o segundo assíncrono e o terceiro uma versão melhorada do primeiro modelo, que inclui minimizações de registradores e circuitos. Uma nova metodologia de criação de padring de microcontroladores, baseada em reuso de informações de projetos anteriores, é apresentada. Essa nova metodologia foi criada para a rápida prototipagem e para diminuir possíveis erros na geração do código do padring. Comparações de resultados de consumo de potência e área são apresentadas para o processador desenvolvido e resultados obtidos com a nova metodologia de geração de padring também são apresentados. Para o processador, um modelo, no qual se utilizam múltiplos barramentos para minimizar o número de ciclos de máquina por instrução, é apresentado. Também foram ressaltadas estruturas que podem ser otimizadas e circuitos que podem ser reaproveitados para diminuir a quantidade de circuito necessário na implementação. Por fim, a nova implementação é comparada com os três modelos anteriores; os ganhos obtidos de desempenho com a implementação dessas estruturas foram de 18% que, convertidos em consumo de potência, representam economia de 13% em relação ao melhor caso dos processadores comparados. A tecnologia utilizada no desenvolvimento dos processadores foi CMOS 250nm da TSMC.
Resumo:
Devido às tendências de crescimento da quantidade de dados processados e a crescente necessidade por computação de alto desempenho, mudanças significativas estão acontecendo no projeto de arquiteturas de computadores. Com isso, tem-se migrado do paradigma sequencial para o paralelo, com centenas ou milhares de núcleos de processamento em um mesmo chip. Dentro desse contexto, o gerenciamento de energia torna-se cada vez mais importante, principalmente em sistemas embarcados, que geralmente são alimentados por baterias. De acordo com a Lei de Moore, o desempenho de um processador dobra a cada 18 meses, porém a capacidade das baterias dobra somente a cada 10 anos. Esta situação provoca uma enorme lacuna, que pode ser amenizada com a utilização de arquiteturas multi-cores heterogêneas. Um desafio fundamental que permanece em aberto para estas arquiteturas é realizar a integração entre desenvolvimento de código embarcado, escalonamento e hardware para gerenciamento de energia. O objetivo geral deste trabalho de doutorado é investigar técnicas para otimização da relação desempenho/consumo de energia em arquiteturas multi-cores heterogêneas single-ISA implementadas em FPGA. Nesse sentido, buscou-se por soluções que obtivessem o melhor desempenho possível a um consumo de energia ótimo. Isto foi feito por meio da combinação de mineração de dados para a análise de softwares baseados em threads aliadas às técnicas tradicionais para gerenciamento de energia, como way-shutdown dinâmico, e uma nova política de escalonamento heterogeneity-aware. Como principais contribuições pode-se citar a combinação de técnicas de gerenciamento de energia em diversos níveis como o nível do hardware, do escalonamento e da compilação; e uma política de escalonamento integrada com uma arquitetura multi-core heterogênea em relação ao tamanho da memória cache L1.
Resumo:
In this paper we describe an hybrid algorithm for an even number of processors based on an algorithm for two processors and the Overlapping Partition Method for tridiagonal systems. Moreover, we compare this hybrid method with the Partition Wang’s method in a BSP computer. Finally, we compare the theoretical computation cost of both methods for a Cray T3D computer, using the cost model that BSP model provides.
Resumo:
SLAM is a popular task used by robots and autonomous vehicles to build a map of an unknown environment and, at the same time, to determine their location within the map. This paper describes a SLAM-based, probabilistic robotic system able to learn the essential features of different parts of its environment. Some previous SLAM implementations had computational complexities ranging from O(Nlog(N)) to O(N2), where N is the number of map features. Unlike these methods, our approach reduces the computational complexity to O(N) by using a model to fuse the information from the sensors after applying the Bayesian paradigm. Once the training process is completed, the robot identifies and locates those areas that potentially match the sections that have been previously learned. After the training, the robot navigates and extracts a three-dimensional map of the environment using a single laser sensor. Thus, it perceives different sections of its world. In addition, in order to make our system able to be used in a low-cost robot, low-complexity algorithms that can be easily implemented on embedded processors or microcontrollers are used.
Resumo:
Software-based techniques offer several advantages to increase the reliability of processor-based systems at very low cost, but they cause performance degradation and an increase of the code size. To meet constraints in performance and memory, we propose SETA, a new control-flow software-only technique that uses assertions to detect errors affecting the program flow. SETA is an independent technique, but it was conceived to work together with previously proposed data-flow techniques that aim at reducing performance and memory overheads. Thus, SETA is combined with such data-flow techniques and submitted to a fault injection campaign. Simulation and neutron induced SEE tests show high fault coverage at performance and memory overheads inferior to the state-of-the-art.
Resumo:
El avance en la potencia de cómputo en nuestros días viene dado por la paralelización del procesamiento, dadas las características que disponen las nuevas arquitecturas de hardware. Utilizar convenientemente este hardware impacta en la aceleración de los algoritmos en ejecución (programas). Sin embargo, convertir de forma adecuada el algoritmo en su forma paralela es complejo, y a su vez, esta forma, es específica para cada tipo de hardware paralelo. En la actualidad los procesadores de uso general más comunes son los multicore, procesadores paralelos, también denominados Symmetric Multi-Processors (SMP). Hoy en día es difícil hallar un procesador para computadoras de escritorio que no tengan algún tipo de paralelismo del caracterizado por los SMP, siendo la tendencia de desarrollo, que cada día nos encontremos con procesadores con mayor numero de cores disponibles. Por otro lado, los dispositivos de procesamiento de video (Graphics Processor Units - GPU), a su vez, han ido desarrollando su potencia de cómputo por medio de disponer de múltiples unidades de procesamiento dentro de su composición electrónica, a tal punto que en la actualidad no es difícil encontrar placas de GPU con capacidad de 200 a 400 hilos de procesamiento paralelo. Estos procesadores son muy veloces y específicos para la tarea que fueron desarrollados, principalmente el procesamiento de video. Sin embargo, como este tipo de procesadores tiene muchos puntos en común con el procesamiento científico, estos dispositivos han ido reorientándose con el nombre de General Processing Graphics Processor Unit (GPGPU). A diferencia de los procesadores SMP señalados anteriormente, las GPGPU no son de propósito general y tienen sus complicaciones para uso general debido al límite en la cantidad de memoria que cada placa puede disponer y al tipo de procesamiento paralelo que debe realizar para poder ser productiva su utilización. Los dispositivos de lógica programable, FPGA, son dispositivos capaces de realizar grandes cantidades de operaciones en paralelo, por lo que pueden ser usados para la implementación de algoritmos específicos, aprovechando el paralelismo que estas ofrecen. Su inconveniente viene derivado de la complejidad para la programación y el testing del algoritmo instanciado en el dispositivo. Ante esta diversidad de procesadores paralelos, el objetivo de nuestro trabajo está enfocado en analizar las características especificas que cada uno de estos tienen, y su impacto en la estructura de los algoritmos para que su utilización pueda obtener rendimientos de procesamiento acordes al número de recursos utilizados y combinarlos de forma tal que su complementación sea benéfica. Específicamente, partiendo desde las características del hardware, determinar las propiedades que el algoritmo paralelo debe tener para poder ser acelerado. Las características de los algoritmos paralelos determinará a su vez cuál de estos nuevos tipos de hardware son los mas adecuados para su instanciación. En particular serán tenidos en cuenta el nivel de dependencia de datos, la necesidad de realizar sincronizaciones durante el procesamiento paralelo, el tamaño de datos a procesar y la complejidad de la programación paralela en cada tipo de hardware.
Resumo:
The ability to view and interact with 3D models has been happening for a long time. However, vision-based 3D modeling has only seen limited success in applications, as it faces many technical challenges. Hand-held mobile devices have changed the way we interact with virtual reality environments. Their high mobility and technical features, such as inertial sensors, cameras and fast processors, are especially attractive for advancing the state of the art in virtual reality systems. Also, their ubiquity and fast Internet connection open a path to distributed and collaborative development. However, such path has not been fully explored in many domains. VR systems for real world engineering contexts are still difficult to use, especially when geographically dispersed engineering teams need to collaboratively visualize and review 3D CAD models. Another challenge is the ability to rendering these environments at the required interactive rates and with high fidelity. In this document it is presented a virtual reality system mobile for visualization, navigation and reviewing large scale 3D CAD models, held under the CEDAR (Collaborative Engineering Design and Review) project. It’s focused on interaction using different navigation modes. The system uses the mobile device's inertial sensors and camera to allow users to navigate through large scale models. IT professionals, architects, civil engineers and oil industry experts were involved in a qualitative assessment of the CEDAR system, in the form of direct user interaction with the prototypes and audio-recorded interviews about the prototypes. The lessons learned are valuable and are presented on this document. Subsequently it was prepared a quantitative study on the different navigation modes to analyze the best mode to use it in a given situation.
Resumo:
Shipping list no.: 2000-0347-P.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
In both Australia and Brazil there are rapid changes occurring in the macroenvironment of the dairy industry. These changes are sometimes not noticed in the microenvironment of the farm, due to the labour-intensive nature of family farms, and the traditionally weak links between production and marketing. Trends in the external environment need to be discussed in a cooperative framework, to plan integrated actions for the dairy community as a whole and to demand actions from research, development and extension (R, D & E). This paper reviews the evolution of R, D & E in terms of paradigms and approaches, the present strategies used to identify dairy industry needs in Australia and Brazil, and presents a participatory strategy to design R, D & E actions for both countries. The strategy incorporates an integration of the opinions of key industry actors ( defined as members of the dairy and associated communities), especially farm suppliers ( input market), farmers, R, D & E people, milk processors and credit providers. The strategy also uses case studies with farm stays, purposive sampling, snowball interviewing techniques, semi-structured interviews, content analysis, focus group meetings, and feedback analysis, to refine the priorities for R, D & E actions in the region.