7 resultados para hardware computing
em Universidade Complutense de Madrid
Resumo:
We describe Janus, a massively parallel FPGA-based computer optimized for the simulation of spin glasses, theoretical models for the behavior of glassy materials. FPGAs (as compared to GPUs or many-core processors) provide a complementary approach to massively parallel computing. In particular, our model problem is formulated in terms of binary variables, and floating-point operations can be (almost) completely avoided. The FPGA architecture allows us to run many independent threads with almost no latencies in memory access, thus updating up to 1024 spins per cycle. We describe Janus in detail and we summarize the physics results obtained in four years of operation of this machine; we discuss two types of physics applications: long simulations on very large systems (which try to mimic and provide understanding about the experimental non equilibrium dynamics), and low-temperature equilibrium simulations using an artificial parallel tempering dynamics. The time scale of our non-equilibrium simulations spans eleven orders of magnitude (from picoseconds to a tenth of a second). On the other hand, our equilibrium simulations are unprecedented both because of the low temperatures reached and for the large systems that we have brought to equilibrium. A finite-time scaling ansatz emerges from the detailed comparison of the two sets of simulations. Janus has made it possible to perform spin glass simulations that would take several decades on more conventional architectures. The paper ends with an assessment of the potential of possible future versions of the Janus architecture, based on state-of-the-art technology.
Resumo:
We present Tethered Monte Carlo, a simple, general purpose method of computing the effective potential of the order parameter (Helmholtz free energy). This formalism is based on a new statistical ensemble, closely related to the micromagnetic one, but with an extended configuration space (through Creutz-like demons). Canonical averages for arbitrary values of the external magnetic field are computed without additional simulations. The method is put to work in the two-dimensional Ising model, where the existence of exact results enables us to perform high precision checks. A rather peculiar feature of our implementation, which employs a local Metropolis algorithm, is the total absence, within errors, of critical slowing down for magnetic observables. Indeed, high accuracy results are presented for lattices as large as L = 1024.
Resumo:
We present Tethered Monte Carlo, a simple, general purpose method of computing the effective potential of the order parameter (Helmholtz free energy). This formalism is based on a new statistical ensemble, closely related to the micromagnetic one, but with an extended configuration space (through Creutz-like demons). Canonical averages for arbitrary values of the external magnetic field are computed without additional simulations. The method is put to work in the two-dimensional Ising model, where the existence of exact results enables us to perform high precision checks. A rather peculiar feature of our implementation, which employs a local Metropolis algorithm, is the total absence, within errors, of critical slowing down for magnetic observables. Indeed, high accuracy results are presented for lattices as large as L = 1024.
Resumo:
This paper describes JANUS, a modular massively parallel and reconfigurable FPGA-based computing system. Each JANUS module has a computational core and a host. The computational core is a 4x4 array of FPGA-based processing elements with nearest-neighbor data links. Processors are also directly connected to an I/O node attached to the JANUS host, a conventional PC. JANUS is tailored for, but not limited to, the requirements of a class of hard scientific applications characterized by regular code structure, unconventional data manipulation instructions and not too large data-base size. We discuss the architecture of this configurable machine, and focus on its use on Monte Carlo simulations of statistical mechanics. On this class of application JANUS achieves impressive performances: in some cases one JANUS processing element outperfoms high-end PCs by a factor ≈1000. We also discuss the role of JANUS on other classes of scientific applications.
Resumo:
Efficient hardware implementations of arithmetic operations in the Galois field are highly desirable for several applications, such as coding theory, computer algebra and cryptography. Among these operations, multiplication is of special interest because it is considered the most important building block. Therefore, high-speed algorithms and hardware architectures for computing multiplication are highly required. In this paper, bit-parallel polynomial basis multipliers over the binary field GF(2(m)) generated using type II irreducible pentanomials are considered. The multiplier here presented has the lowest time complexity known to date for similar multipliers based on this type of irreducible pentanomials.
Resumo:
Hoy día vivimos en la sociedad de la tecnología, en la que la mayoría de las cosas cuentan con uno o varios procesadores y es necesario realizar cómputos para hacer más agradable la vida del ser humano. Esta necesidad nos ha brindado la posibilidad de asistir en la historia a un acontecimiento sin precedentes, en el que la cantidad de transistores era duplicada cada dos años, y con ello, mejorada la velocidad de cómputo (Moore, 1965). Tal acontecimiento nos ha llevado a la situación actual, en la que encontramos placas con la capacidad de los computadores de hace años, consumiendo muchísima menos energía y ocupando muchísimo menos espacio, aunque tales prestaciones quedan un poco escasas para lo que se requiere hoy día. De ahí surge la idea de comunicar placas que se complementan en aspectos en las que ambas se ven limitadas. En nuestro proyecto desarrollaremos una interfaz s oftware/hardware para facilitar la comunicación entre dos placas con distintas prestaciones, a saber, una Raspberry Pi modelo A 2012 y una FPGA Spartan XSA3S1000 con placa extendida XStend Board V3.0. Dicha comunicación se basará en el envío y recepción de bits en serie, y será la Raspberry Pi quien marque las fases de la comunicación. El proyecto se divide en dos partes: La primera parte consiste en el desarrollo de un módulo para el kernel de Linux, que se encarga de gestionar las entradas y salidas de datos de la Raspberry Pi cuando se realizan las pertinentes llamadas de write o read. Mediante el control de los GPIO y la gestión de las distintas señales, se realiza la primera fase de la comunicación. La segunda parte consiste en el desarrollo de un diseño en VHDL para la FPGA, mediante el cual se pueda gestionar la recepción, cómputo y posterior envío de bits, de forma que la Raspberry Pi pueda disponer de los datos una vez hayan sido calculados. Ambas partes han sido desarrolladas bajo licencias libres (GPL) para que estén disponibles a cualquier persona interesada en el desarrollo y que deseen su reutilización.
Resumo:
PMCTrack es una herramienta de código abierto para Linux que permite monitorizar el rendimiento de las aplicaciones haciendo uso de los contadores hardware del procesador. Esta herramienta soporta la captura de métricas como el número de instrucciones por ciclo o la tasa de fallos de cache. El objetivo de este proyecto es portar PMCTrack al sistema operativo Android sobre plataformas que integran procesadores de ARM. Esto conlleva la realización de las siguientes tareas: (1) modificación de la variante del kernel Linux propia de Android para incluir las extensiones requeridas por el módulo del kernel de PMCTrack, (2) adaptación de las herramientas de modo usuario de PMCTrack, y (3) desarrollo de una aplicación Android que permita visualizar en tiempo real las medidas de los contadores recabadas para las distintas aplicaciones que están siendo monitorizadas. Para poner a prueba la adaptación de la herramienta PMCTrack al sistema operativo Android y mostrar la utilidad de nuestras aportaciones, se han llevado a cabo diversos casos de estudio empleando la placa de desarrollo Odroid XU4.