97 resultados para implementations
Resumo:
A variation of the least means squares (LMS) algorithm, called the delayed LMS (DLMS) algorithm is an ideally suited to achieve highly pipelined, adaptive digital filter implementations. The paper presents an efficient method of determining the delays in the DLMS filter and then transferring these delays using retiming in order to achieve fully pipelined circuit architectures for FPGA implementation. The method has been used to derive a series of retimed delayed LMS (RDLMS) architectures, which considerable reduce the number of delays and convergence time and give superior performance in terms of throughput rate when compared to previous work. Three circuit architectures and three hardware shared versions are presented which have been implemented using the Virtex-II FPGA technology resulting in a throughout rate of 182 Msample/s.
Resumo:
The second round of the NIST-run public competition is underway to find a new hash algorithm(s) for inclusion in the NIST Secure Hash Standard (SHA-3). This paper presents the full implementations of all of the second round candidates in hardware with all of their variants. In order to determine their computational efficiency, an important aspect in NIST's round two evaluation criteria, this paper gives an area/speed comparison of each design both with and without a hardware interface, thereby giving an overall impression of their performance in resource constrained and resource abundant environments. The implementation results are provided for a Virtex-5 FPGA device. The efficiency of the architectures for the hash functions are compared in terms of throughput per unit area. To the best of the authors' knowledge, this is the first work to date to present hardware designs which test for all message digest sizes (224, 256, 384, 512), and also the only work to include the padding as part of the hardware for the SHA-3 hash functions.
Resumo:
This paper presents single-chip FPGA Rijndael algorithm implementations of the Advanced Encryption Standard (AES) algorithm, Rijndael. In particular, the designs utilise look-up tables to implement the entire Rijndael Round function. A comparison is provided between these designs and similar existing implementations. Hardware implementations of encryption algorithms prove much faster than equivalent software implementations and since there is a need to perform encryption on data in real time, speed is very important. In particular, Field Programmable Gate Arrays (FPGAs) are well suited to encryption implementations due to their flexibility and an architecture, which can be exploited to accommodate typical encryption transformations. In this paper, a Look-Up Table (LUT) methodology is introduced where complex and slow operations are replaced by simple LUTs. A LUT-based fully pipelined Rijndael implementation is described which has a pre-placement performance of 12 Gbits/sec, which is a factor 1.2 times faster than an alternative design in which look-up tables are utilised to implement only one of the Round function transformations, and 6 times faster than other previous single-chip implementations. Iterative Rijndael implementations based on the Look-Up-Table design approach are also discussed and prove faster than typical iterative implementations.
Resumo:
We test current numerical implementations of laser-matter interactions by comparison with exact analytical results. Focusing on photon emission processes, it is found that the numerics accurately reproduce analytical emission spectra in all considered regimes, except for the harmonic structures often singled out as the most significant high-intensity (multiphoton) effects. We find that this discrepancy originates in the use of the locally constant field approximation.
Resumo:
A BSP (Bulk Synchronous Parallelism) computation is characterized by the generation of asynchronous messages in packages during independent execution of a number of processes and their subsequent delivery at synchronization points. Bundling messages together represents a significant departure from the traditional ‘one communication at a time’ approach. In this paper the semantic consequences of communication packaging are explored. In particular, the BSP communication structure is identified with a general form of substitution—predicate substitution. Predicate substitution provides a means of reasoning about the synchronized delivery of asynchronous communications when the immediate programming context does not explicitly refer to the variables that are to be updated (unlike traditional operations, such as the assignment $x := e$, where the names of the updated variables can be extracted from the context). Proofs of implementations of Newton's root finding method and prefix sum are used to illustrate the practical application of the proposed approach.
Resumo:
A generic architecture for implementing the advanced encryption standard (AES) encryption algorithm in silicon is proposed. This allows the instantiation of a wide range of chip specifications, with these taking the form of semiconductor intellectual property (IP) cores. Cores implemented from this architecture can perform both encryption and decryption and support four modes of operation: (i) electronic codebook mode; (ii) output feedback mode; (iii) cipher block chaining mode; and (iv) ciphertext feedback mode. Chip designs can also be generated to cover all three AES key lengths, namely 128 bits, 192 bits and 256 bits. On-the-fly generation of the round keys required during decryption is also possible. The general, flexible and multi-functional nature of the approach described contrasts with previous designs which, to date, have been focused on specific implementations. The presented ideas are demonstrated by implementation in FPGA technology. However, the architecture and IP cores derived from this are easily migratable to other silicon technologies including ASIC and PLD and are capable of covering a wide range of modem communication systems cryptographic requirements. Moreover, the designs produced have a gate count and throughput comparable with or better than the previous one-off solutions.
Resumo:
High-speed field-programmable gate array (FPGA) implementations of an adaptive least mean square (LMS) filter with application in an electronic support measures (ESM) digital receiver, are presented. They employ "fine-grained" pipelining, i.e., pipelining within the processor and result in an increased output latency when used in the LMS recursive system. Therefore, the major challenge is to maintain a low latency output whilst increasing the pipeline stage in the filter for higher speeds. Using the delayed LMS (DLMS) algorithm, fine-grained pipelined FPGA implementations using both the direct form (DF) and the transposed form (TF) are considered and compared. It is shown that the direct form LMS filter utilizes the FPGA resources more efficiently thereby allowing a 120 MHz sampling rate.
Resumo:
Hardware implementations of arithmetic operators using signed digit arithmetic have lost some of their earlier popularity. However, SD is revisited and used to realise an efficient radix-16 generic multiplier, which has particular potential for low-power implementation. The SD multiplier algorithm reduces the number of partial products to as much as 1/4, and in initial tests reduces the estimated power consumption to only about 50% of that of the Booth multiplier. It is different from other previous high-radix methods in that it employs a novel method to generate its partial products with zero arithmetic logic.
Resumo:
A methodology for rapid silicon design of biorthogonal wavelet transform systems has been developed. This is based on generic, scalable architectures for the forward and inverse wavelet filters. These architectures offer efficient hardware utilisation by combining the linear phase property of biorthogonal filters with decimation and interpolation. The resulting designs have been parameterised in terms of types of wavelet and wordlengths for data and coefficients. Control circuitry is embedded within these cores that allows them to be cascaded for any desired level of decomposition without any interface logic. The time to produce silicon designs for a biorthogonal wavelet system is only the time required to run synthesis and layout tools with no further design effort required. The resulting silicon cores produced are comparable in area and performance to hand-crafted designs. These designs are also portable across a range of foundries and are suitable for FPGA and PLD implementations.