260 resultados para Programmable array logic
Resumo:
Methods by which bit level systolic array chips can be made fault tolerant are discussed briefly. Using a simple analysis based on both Poisson and Bose-Einstein statistics authors demonstrate that such techniques can be used to obtain significant yield enhancement. Alternatively, the dimensions of an array can be increased considerably for the same initial (nonfault tolerant) chip yield.
Resumo:
A bit level systolic array for computing the convolution operation is described. The circuit in question is highly regular and ideally suited to VLSI chip design. It is also optimized in the sense that all the cells contribute to the computation on each clock cycle. This makes the array almost four times more efficient than one which was previously described.
Resumo:
Two major UK systolic array projects are described. The first concerns the development of a wavefront array processor for adaptive beamforming; the second concerns the design of bit-level systolic arrays for high-performance signal processing.
Resumo:
We show how the architecture of two recently reported bit-level systolic array circuits - a single-bit coefficient correlator and a multibit convolver - may be modified to incorporate unidirectional data flow. This feature has advantages in terms of chip cascadability, fault tolerance and possible wafer-scale integration.
Resumo:
A pipelined array multiplier which has been derived by applying 'systolic array' principles at the bit level is described. Attention is focused on a circuit which is used to multiply streams of parallel unsigned data. Then an algorithm is given which demonstrates that, with only a simple modification to the basic cell, the same array can cope with two's complement numbers. The resulting structure has a number of features whch make it attractive to LSI and VLSI. These include regularity and modularity.
Resumo:
A bit-level systolic array for computing matrix x vector products is described. The operation is carried out on bit parallel input data words and the basic circuit takes the form of a 1-bit slice. Several bit-slice components must be connected together to form the final result, and authors outline two different ways in which this can be done. The basic array also has considerable potential as a stand-alone device, and its use in computing the Walsh-Hadamard transform and discrete Fourier transform operations is briefly discussed.
Resumo:
Details are presented of the DAC (DSP ASIC Compiler) silicon compiler framework. DAC allows a non-specialist to automatically design DSP ASICs and DSP ASIC cores directly form a high level specification. Typical designs take only several minutes and the resulting layouts are comparable in area and performance to handcrafted designs.
Resumo:
A novel design for multibit convolver circuits is described. The circuits take the form of systolic arrays of simple one-bit processor and memory cells, with the result that they can operate at very high data rates and should be easy to implement using VLSI technology. An efficient method for handling two's complement data within the array is described and the relative advantages of this convolver design compared with more conventional circuits is discussed.
Resumo:
The use of bit-level systolic arrays in the design of a vector quantized transformed subband coding system for speech signals is described. It is shown how the major components of this system can be decomposed into a small number of highly regular building blocks that interface directly to one another. These include circuits for the computation of the discrete cosine transform, the inverse discrete cosine transform, and vector quantization codebook search.
Resumo:
A bit level systolic array system is proposed for the Winograd Fourier transform algorithm. The design uses bit-serial arithmetic and, in common with other systolic arrays, features nearest-neighbor interconnections, regularity and high throughput. The short interconnections in this method contrast favorably with the long interconnections between butterflies required in the FFT. The structure is well suited to VLSI implementations. It is demonstrated how long transforms can be implemented with components designed to perform a short length transform. These components build into longer transforms preserving the regularity and structure of the short length transform design.
Resumo:
Whilst conventional bit level pipelining introduces an m cycle delay, it does allow m separate computations to be processed at throughput rates comparable to that using word level systolic arrays. We concentrate on exploiting this delay and describe a systematic method for the design of high performance multiplexed IIR filters. Two multiply and accumulate structures are identified based on shift-and-add and carry-save data organisations which can be used as building blocks in the design of IIR filters. By replacing the word level multiply and accumulate units in word level systolic structures with their equivalent bit level circuits and introducing latches to ensure correct timing, numerous architectures can be designed that process multiplexed data directly without any additional circuit overhead.
Resumo:
A novel hardware architecture for elliptic curve cryptography (ECC) over GF(p) is introduced. This can perform the main prime field arithmetic functions needed in these cryptosystems including modular inversion and multiplication. This is based on a new unified modular inversion algorithm that offers considerable improvement over previous ECC techniques that use Fermat's Little Theorem for this operation. The processor described uses a full-word multiplier which requires much fewer clock cycles than previous methods, while still maintaining a competitive critical path delay. The benefits of the approach have been demonstrated by utilizing these techniques to create a field-programmable gate array (FPGA) design. This can perform a 256-bit prime field scalar point multiplication in 3.86 ms, the fastest FPGA time reported to date. The ECC architecture described can also perform four different types of modular inversion, making it suitable for use in many different ECC applications. © 2006 IEEE.
Resumo:
A scheduling method for implementing a generic linear QR array processor architecture is presented. This improves on previous work. It also considerably simplifies the derivation of schedules for a folded linear system, where detailed account has to be taken of processor cell latency. The architecture and scheduling derived provide the basis of a generator for the rapid design of System-on-a-Chip (SoC) cores for QR decomposition.
Resumo:
Security devices are vulnerable to Differential Power Analysis (DPA) that reveals the key by monitoring the power consumption of the circuits. In this paper, we present the first DPA attack against an FPGA implementation of the Camellia encryption algorithm with all key sizes and evaluate the DPA resistance of the algorithm. The Camellia cryptographic algorithm involves several different key-dependent intermediate operations including S-Box operations. In previous research, it was believed that the Camellia is stronger than AES due to the additional Whitening phase protecting the S-Box operation. However, we propose an attack that bypasses the Whitening phase and targets the S-Box. In this paper, we also discuss a lowcost countermeasure strategy to protect the Pre-whitening / Post-whitening and FL function of Camellia using Dual-rail Precharged Logic and to protect against attacks of the S-Box using Random Delay Insertion. © 2009 IEEE.