Biblioteca Digital

New FPGA architectures for the ordinary Montgomery multiplication algorithm and the FIOS modular multiplication algorithm are presented. The embedded 18×18-bit multipliers and fast carry look-ahead logic located on the Xilinx Virtex2 Pro family of FPGAs are used to perform the ordinary multiplications and additions/subtractions required by these two algorithms. The architectures are developed for use in Elliptic Curve Cryptosystems over GF(p), which require modular field multiplication to perform elliptic curve point addition and doubling. Field sizes of 128-bits and 256-bits are chosen but other field sizes can easily be accommodated, by rapidly reprogramming the FPGA. Overall, the larger the word size of the multiplier, the more efficiently it performs in terms of area/time product. Also, the FIOS algorithm is flexible in that one can tailor the multiplier architecture is to be area efficient, time efficient or a mixture of both by choosing a particular word size. It is estimated that the computation of a 256-bit scalar point multiplication over GF(p) would take about 4.8 ms.

Veja mais

An efficient feature selection method for mobile devices with application to activity recognition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a feature selection method for data classification, which combines a model-based variable selection technique and a fast two-stage subset selection algorithm. The relationship between a specified (and complete) set of candidate features and the class label is modelled using a non-linear full regression model which is linear-in-the-parameters. The performance of a sub-model measured by the sum of the squared-errors (SSE) is used to score the informativeness of the subset of features involved in the sub-model. The two-stage subset selection algorithm approaches a solution sub-model with the SSE being locally minimized. The features involved in the solution sub-model are selected as inputs to support vector machines (SVMs) for classification. The memory requirement of this algorithm is independent of the number of training patterns. This property makes this method suitable for applications executed in mobile devices where physical RAM memory is very limited. An application was developed for activity recognition, which implements the proposed feature selection algorithm and an SVM training procedure. Experiments are carried out with the application running on a PDA for human activity recognition using accelerometer data. A comparison with an information gain based feature selection method demonstrates the effectiveness and efficiency of the proposed algorithm.

Veja mais

FPGA Implementation of a Pipelined Gaussian Calculation for HMM-Based Large Vocabulary Speech Recognition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.

Veja mais

Comparing the implementation of two-dimensional numerical quadrature on GPU, FPGA and ClearSpeed systems to study electron scattering by atoms

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of accelerators, with compute architectures different and distinct from the CPU, has become a new research frontier in high-performance computing over the past ?ve years. This paper is a case study on how the instruction-level parallelism offered by three accelerator technologies, FPGA, GPU and ClearSpeed, can be exploited in atomic physics. The algorithm studied is the evaluation of two electron integrals, using direct numerical quadrature, a task that arises in the study of intermediate energy electron scattering by hydrogen atoms. The results of our ‘productivity’ study show that while each accelerator is viable, there are considerable differences in the implementation strategies that must be followed on each.

Veja mais