999 resultados para Co-processors
Resumo:
Los algoritmos basados en registros de desplazamiento con realimentación (en inglés FSR) se han utilizado como generadores de flujos pseudoaleatorios en aplicaciones con recursos limitados como los sistemas de apertura sin llave. Se considera canal primario a aquel que se utiliza para realizar una transmisión de información. La aparición de los ataques de canal auxiliar (en inglés SCA), que explotan información filtrada inintencionadamente a través de canales laterales como el consumo, las emisiones electromagnéticas o el tiempo empleado, supone una grave amenaza para estas aplicaciones, dado que los dispositivos son accesibles por un atacante. El objetivo de esta tesis es proporcionar un conjunto de protecciones que se puedan aplicar de forma automática y que utilicen recursos ya disponibles, evitando un incremento sustancial en los costes y alargando la vida útil de aplicaciones que puedan estar desplegadas. Explotamos el paralelismo existente en algoritmos FSR, ya que sólo hay 1 bit de diferencia entre estados de rondas consecutivas. Realizamos aportaciones en tres niveles: a nivel de sistema, utilizando un coprocesador reconfigurable, a través del compilador y a nivel de bit, aprovechando los recursos disponibles en el procesador. Proponemos un marco de trabajo que nos permite evaluar implementaciones de un algoritmo incluyendo los efectos introducidos por el compilador considerando que el atacante es experto. En el campo de los ataques, hemos propuesto un nuevo ataque diferencial que se adapta mejor a las condiciones de las implementaciones software de FSR, en las que el consumo entre rondas es muy similar. SORU2 es un co-procesador vectorial reconfigurable propuesto para reducir el consumo energético en aplicaciones con paralelismo y basadas en el uso de bucles. Proponemos el uso de SORU2, además, para ejecutar algoritmos basados en FSR de forma segura. Al ser reconfigurable, no supone un sobrecoste en recursos, ya que no está dedicado en exclusiva al algoritmo de cifrado. Proponemos una configuración que ejecuta múltiples algoritmos de cifrado similares de forma simultánea, con distintas implementaciones y claves. A partir de una implementación sin protecciones, que demostramos que es completamente vulnerable ante SCA, obtenemos una implementación segura a los ataques que hemos realizado. A nivel de compilador, proponemos un mecanismo para evaluar los efectos de las secuencias de optimización del compilador sobre una implementación. El número de posibles secuencias de optimizaciones de compilador es extremadamente alto. El marco de trabajo propuesto incluye un algoritmo para la selección de las secuencias de optimización a considerar. Debido a que las optimizaciones del compilador transforman las implementaciones, se pueden generar automáticamente implementaciones diferentes combinamos para incrementar la seguridad ante SCA. Proponemos 2 mecanismos de aplicación de estas contramedidas, que aumentan la seguridad de la implementación original sin poder considerarse seguras. Finalmente hemos propuesto la ejecución paralela a nivel de bit del algoritmo en un procesador. Utilizamos la forma algebraica normal del algoritmo, que automáticamente se paraleliza. La implementación sobre el algoritmo evaluado mejora en rendimiento y evita que se filtre información por una ejecución dependiente de datos. Sin embargo, es más vulnerable ante ataques diferenciales que la implementación original. Proponemos una modificación del algoritmo para obtener una implementación segura, descartando parcialmente ejecuciones del algoritmo, de forma aleatoria. Esta implementación no introduce una sobrecarga en rendimiento comparada con las implementaciones originales. En definitiva, hemos propuesto varios mecanismos originales a distintos niveles para introducir aleatoridad en implementaciones de algoritmos FSR sin incrementar sustancialmente los recursos necesarios. ABSTRACT Feedback Shift Registers (FSR) have been traditionally used to implement pseudorandom sequence generators. These generators are used in Stream ciphers in systems with tight resource constraints, such as Remote Keyless Entry. When communicating electronic devices, the primary channel is the one used to transmit the information. Side-Channel Attack (SCA) use additional information leaking from the actual implementation, including power consumption, electromagnetic emissions or timing information. Side-Channel Attacks (SCA) are a serious threat to FSR-based applications, as an attacker usually has physical access to the devices. The main objective of this Ph.D. thesis is to provide a set of countermeasures that can be applied automatically using the available resources, avoiding a significant cost overhead and extending the useful life of deployed systems. If possible, we propose to take advantage of the inherent parallelism of FSR-based algorithms, as the state of a FSR differs from previous values only in 1-bit. We have contributed in three different levels: architecture (using a reconfigurable co-processor), using compiler optimizations, and at bit level, making the most of the resources available at the processor. We have developed a framework to evaluate implementations of an algorithm including the effects introduced by the compiler. We consider the presence of an expert attacker with great knowledge on the application and the device. Regarding SCA, we have presented a new differential SCA that performs better than traditional SCA on software FSR-based algorithms, where the leaked values are similar between rounds. SORU2 is a reconfigurable vector co-processor. It has been developed to reduce energy consumption in loop-based applications with parallelism. In addition, we propose its use for secure implementations of FSR-based algorithms. The cost overhead is discarded as the co-processor is not exclusively dedicated to the encryption algorithm. We present a co-processor configuration that executes multiple simultaneous encryptions, using different implementations and keys. From a basic implementation, which is proved to be vulnerable to SCA, we obtain an implementation where the SCA applied were unsuccessful. At compiler level, we use the framework to evaluate the effect of sequences of compiler optimization passes on a software implementation. There are many optimization passes available. The optimization sequences are combinations of the available passes. The amount of sequences is extremely high. The framework includes an algorithm for the selection of interesting sequences that require detailed evaluation. As existing compiler optimizations transform the software implementation, using different optimization sequences we can automatically generate different implementations. We propose to randomly switch between the generated implementations to increase the resistance against SCA.We propose two countermeasures. The results show that, although they increase the resistance against SCA, the resulting implementations are not secure. At bit level, we propose to exploit bit level parallelism of FSR-based implementations using pseudo bitslice implementation in a wireless node processor. The bitslice implementation is automatically obtained from the Algebraic Normal Form of the algorithm. The results show a performance improvement, avoiding timing information leakage, but increasing the vulnerability against differential SCA.We provide a secure version of the algorithm by randomly discarding part of the data obtained. The overhead in performance is negligible when compared to the original implementations. To summarize, we have proposed a set of original countermeasures at different levels that introduce randomness in FSR-based algorithms avoiding a heavy overhead on the resources required.
Resumo:
In this paper, we present the outcomes of a project on the exploration of the use of Field Programmable Gate Arrays(FPGAs) as co-processors for scientific computation. We designed a custom circuit for the pipelined solving of multiple tri-diagonal linear systems. The design is well suited for applications that require many independent tri diagonal system solves, such as finite difference methods for solving PDEs or applications utilising cubic spline interpolation. The selected solver algorithm was the Tri Diagonal Matrix Algorithm (TDMA or Thomas Algorithm). Our solver supports user specified precision thought the use of a custom floating point VHDL library supporting addition, subtraction, multiplication and division. The variable precision TDMA solver was tested for correctness in simulation mode. The TDMA pipeline was tested successfully in hardware using a simplified solver model. The details of implementation, the limitations, and future work are also discussed.
Resumo:
In this paper, we present the outcomes of a project on the exploration of the use of Field Programmable Gate Arrays (FPGAs) as co-processors for scientific computation. We designed a custom circuit for the pipelined solving of multiple tri-diagonal linear systems. The design is well suited for applications that require many independent tri-diagonal system solves, such as finite difference methods for solving PDEs or applications utilising cubic spline interpolation. The selected solver algorithm was the Tri-Diagonal Matrix Algorithm (TDMA or Thomas Algorithm). Our solver supports user specified precision thought the use of a custom floating point VHDL library supporting addition, subtraction, multiplication and division. The variable precision TDMA solver was tested for correctness in simulation mode. The TDMA pipeline was tested successfully in hardware using a simplified solver model. The details of implementation, the limitations, and future work are also discussed.
Resumo:
Tridiagonal diagonally dominant linear systems arise in many scientific and engineering applications. The standard Thomas algorithm for solving such systems is inherently serial forming a bottleneck in computation. Algorithms such as cyclic reduction and SPIKE reduce a single large tridiagonal system into multiple small independent systems which can be solved in parallel. We have developed portable cyclic reduction and SPIKE algorithm OpenCL implementations with the intent to target a range of co-processors in a heterogeneous computing environment including Field Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs) and other multi-core processors. In this paper, we evaluate these designs in the context of solver performance, resource efficiency and numerical accuracy.
Resumo:
The high performance and capacity of current FPGAs makes them suitable as acceleration co-processors. This article studies the implementation, for such accelerators, of the floating-point power function xy as defined by the C99 and IEEE 754-2008 standards, generalized here to arbitrary exponent and mantissa sizes. Last-bit accuracy at the smallest possible cost is obtained thanks to a careful study of the various subcomponents: a floating-point logarithm, a modified floating-point exponential, and a truncated floating-point multiplier. A parameterized architecture generator in the open-source FloPoCo project is presented in details and evaluated.
Resumo:
Advances in FPGA technology and higher processing capabilities requirements have pushed to the emerge of All Programmable Systems-on-Chip, which incorporate a hard designed processing system and a programmable logic that enable the development of specialized computer systems for a wide range of practical applications, including data and signal processing, high performance computing, embedded systems, among many others. To give place to an infrastructure that is capable of using the benefits of such a reconfigurable system, the main goal of the thesis is to implement an infrastructure composed of hardware, software and network resources, that incorporates the necessary services for the operation, management and interface of peripherals, that coompose the basic building blocks for the execution of applications. The project will be developed using a chip from the Zynq-7000 All Programmable Systems-on-Chip family.
Resumo:
Co-management is typically known to be a resource management system that shares managerial responsibility between the state and other stakeholders of a resource. In the case of Lake Victoria, one would expect the state to be represented by the fisheries departments of Kenya, Uganda and Tanzania, while stakeholder groups may comprise fishing communities, fish processing factories and municipalities. Taking that into account, the survey's objectives were defined as: (a) To identify the difficulties and impracticalities inherent in implementing state-based regulations via a "top-down" management strategy. (b) To assess the prevalence of community-based institutions that either seek to regulate the fishery or have the potential to be used to regulate it. (c) To identify ways in which community-based regulatory and monitory systems may be established, and how these will fare over time. (d) To identify roles for national Fisheries Departments, industrial fish processors and other stakeholders. (e) To develop well-founded policy suggestions for the establishment of a co-management framework to manage the fisheries of Lake Victoria.
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
The performance, energy efficiency and cost improvements due to traditional technology scaling have begun to slow down and present diminishing returns. Underlying reasons for this trend include fundamental physical limits of transistor scaling, the growing significance of quantum effects as transistors shrink, and a growing mismatch between transistors and interconnects regarding size, speed and power. Continued Moore's Law scaling will not come from technology scaling alone, and must involve improvements to design tools and development of new disruptive technologies such as 3D integration. 3D integration presents potential improvements to interconnect power and delay by translating the routing problem into a third dimension, and facilitates transistor density scaling independent of technology node. Furthermore, 3D IC technology opens up a new architectural design space of heterogeneously-integrated high-bandwidth CPUs. Vertical integration promises to provide the CPU architectures of the future by integrating high performance processors with on-chip high-bandwidth memory systems and highly connected network-on-chip structures. Such techniques can overcome the well-known CPU performance bottlenecks referred to as memory and communication wall. However the promising improvements to performance and energy efficiency offered by 3D CPUs does not come without cost, both in the financial investments to develop the technology, and the increased complexity of design. Two main limitations to 3D IC technology have been heat removal and TSV reliability. Transistor stacking creates increases in power density, current density and thermal resistance in air cooled packages. Furthermore the technology introduces vertical through silicon vias (TSVs) that create new points of failure in the chip and require development of new BEOL technologies. Although these issues can be controlled to some extent using thermal-reliability aware physical and architectural 3D design techniques, high performance embedded cooling schemes, such as micro-fluidic (MF) cooling, are fundamentally necessary to unlock the true potential of 3D ICs. A new paradigm is being put forth which integrates the computational, electrical, physical, thermal and reliability views of a system. The unification of these diverse aspects of integrated circuits is called Co-Design. Independent design and optimization of each aspect leads to sub-optimal designs due to a lack of understanding of cross-domain interactions and their impacts on the feasibility region of the architectural design space. Co-Design enables optimization across layers with a multi-domain view and thus unlocks new high-performance and energy efficient configurations. Although the co-design paradigm is becoming increasingly necessary in all fields of IC design, it is even more critical in 3D ICs where, as we show, the inter-layer coupling and higher degree of connectivity between components exacerbates the interdependence between architectural parameters, physical design parameters and the multitude of metrics of interest to the designer (i.e. power, performance, temperature and reliability). In this dissertation we present a framework for multi-domain co-simulation and co-optimization of 3D CPU architectures with both air and MF cooling solutions. Finally we propose an approach for design space exploration and modeling within the new Co-Design paradigm, and discuss the possible avenues for improvement of this work in the future.