140 resultados para Fpga
Resumo:
New FPGA architectures for the ordinary Montgomery multiplication algorithm and the FIOS modular multiplication algorithm are presented. The embedded 18×18-bit multipliers and fast carry look-ahead logic located on the Xilinx Virtex2 Pro family of FPGAs are used to perform the ordinary multiplications and additions/subtractions required by these two algorithms. The architectures are developed for use in Elliptic Curve Cryptosystems over GF(p), which require modular field multiplication to perform elliptic curve point addition and doubling. Field sizes of 128-bits and 256-bits are chosen but other field sizes can easily be accommodated, by rapidly reprogramming the FPGA. Overall, the larger the word size of the multiplier, the more efficiently it performs in terms of area/time product. Also, the FIOS algorithm is flexible in that one can tailor the multiplier architecture is to be area efficient, time efficient or a mixture of both by choosing a particular word size. It is estimated that the computation of a 256-bit scalar point multiplication over GF(p) would take about 4.8 ms.
Resumo:
A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.
Resumo:
The use of accelerators, with compute architectures different and distinct from the CPU, has become a new research frontier in high-performance computing over the past ?ve years. This paper is a case study on how the instruction-level parallelism offered by three accelerator technologies, FPGA, GPU and ClearSpeed, can be exploited in atomic physics. The algorithm studied is the evaluation of two electron integrals, using direct numerical quadrature, a task that arises in the study of intermediate energy electron scattering by hydrogen atoms. The results of our ‘productivity’ study show that while each accelerator is viable, there are considerable differences in the implementation strategies that must be followed on each.
Resumo:
The second round of the NIST-run public competition is underway to find a new hash algorithm(s) for inclusion in the NIST Secure Hash Standard (SHA-3). This paper presents the full implementations of all of the second round candidates in hardware with all of their variants. In order to determine their computational efficiency, an important aspect in NIST's round two evaluation criteria, this paper gives an area/speed comparison of each design both with and without a hardware interface, thereby giving an overall impression of their performance in resource constrained and resource abundant environments. The implementation results are provided for a Virtex-5 FPGA device. The efficiency of the architectures for the hash functions are compared in terms of throughput per unit area. To the best of the authors' knowledge, this is the first work to date to present hardware designs which test for all message digest sizes (224, 256, 384, 512), and also the only work to include the padding as part of the hardware for the SHA-3 hash functions.
Resumo:
Sphere Decoding (SD) is a highly effective detection technique for Multiple-Input Multiple-Output (MIMO) wireless communications receivers, offering quasi-optimal accuracy with relatively low computational complexity as compared to the ideal ML detector. Despite this, the computational demands of even low-complexity SD variants, such as Fixed Complexity SD (FSD), remains such that implementation on modern software-defined network equipment is a highly challenging process, and indeed real-time solutions for MIMO systems such as 4 4 16-QAM 802.11n are unreported. This paper overcomes this barrier. By exploiting large-scale networks of fine-grained softwareprogrammable processors on Field Programmable Gate Array (FPGA), a series of unique SD implementations are presented, culminating in the only single-chip, real-time quasi-optimal SD for 44 16-QAM 802.11n MIMO. Furthermore, it demonstrates that the high performance software-defined architectures which enable these implementations exhibit cost comparable to dedicated circuit architectures.
Resumo:
The use of dataflow digital signal processing system modelling
and synthesis techniques has been a fruitful research theme for many years and has yielded many powerful rapid system synthesis and optimisation capabilities. However, recent years have seen the spectrum of languages and techniques splinter in an application specific manner, resulting in an ad-hoc design process which is increasingly dependent on the particular application under development. This poses a major problem for automated toolflows attempting to provide rapid system synthesis for a wide ranges of applications. By analysing a number of dataflow FPGA implementation case studies, this paper shows that despit ethis common traits may be found in current techniques, which fall largely into three classes. Further, it exposes limitations pertaining to their ability to adapt algorith models to implementations for different operating environments and target platforms.
Resumo:
To enable reliable data transfer in next generation Multiple-Input Multiple-Output (MIMO) communication systems, terminals must be able to react to fluctuating channel conditions by having flexible modulation schemes and antenna configurations. This creates a challenging real-time implementation problem: to provide the high performance required of cutting edge MIMO standards, such as 802.11n, with the flexibility for this behavioural variability. FPGA softcore processors offer a solution to this problem, and in this paper we show how heterogeneous SISD/SIMD/MIMD architectures can enable programmable multicore architectures on FPGA with similar performance and cost as traditional dedicated circuit-based architectures. When applied to a 4×4 16-QAM Fixed-Complexity Sphere Decoder (FSD) detector we present the first soft-processor based solution for real-time 802.11n MIMO.