171 resultados para minimalist hardware architecture
Resumo:
This paper presents a multi-language framework to FPGA hardware development which aims to satisfy the dual requirement of high-level hardware design and efficient hardware implementation. The central idea of this framework is the integration of different hardware languages in a way that harnesses the best features of each language. This is illustrated in this paper by the integration of two hardware languages in the form of HIDE: a structured hardware language which provides more abstract and elegant hardware descriptions and compositions than are possible in traditional hardware description languages such as VHDL or Verilog, and Handel-C: an ANSI C-like hardware language which allows software and hardware engineers alike to target FPGAs from high-level algorithmic descriptions. On the one hand, HIDE has proven to be very successful in the description and generation of highly optimised parameterisable FPGA circuits from geometric descriptions. On the other hand, Handel-C has also proven to be very successful in the rapid design and prototyping of FPGA circuits from algorithmic application descriptions. The proposed integrated framework hence harnesses HIDE for the generation of highly optimised circuits for regular parts of algorithms, while Handel-C is used as a top-level design language from which HIDE functionality is dynamically invoked. The overall message of this paper posits that there need not be an exclusive choice between different hardware design flows. Rather, an integrated framework where different design flows can seamlessly interoperate should be adopted. Although the idea might seem simple prima facie, it could have serious implications on the design of future generations of hardware languages.
Resumo:
A hardware performance analysis of the SHACAL-2 encryption algorithm is presented in this paper. SHACAL-2 was one of four symmetric key algorithms chosen in the New European Schemes for Signatures, Integrity and Encryption (NESSIE) initiative in 2003. The paper describes a fully pipelined encryption SHACAL-2 architecture implemented on a Xilinx Field Programmable Gate Array (FPGA) device that achieves a throughput of over 25 Gbps. This is the fastest private key encryption algorithm architecture currently available. The SHACAL-2 decryption algorithm is also defined in the paper as it was not provided in the NESSIE submission.
Resumo:
A generic architecture for implementing the advanced encryption standard (AES) encryption algorithm in silicon is proposed. This allows the instantiation of a wide range of chip specifications, with these taking the form of semiconductor intellectual property (IP) cores. Cores implemented from this architecture can perform both encryption and decryption and support four modes of operation: (i) electronic codebook mode; (ii) output feedback mode; (iii) cipher block chaining mode; and (iv) ciphertext feedback mode. Chip designs can also be generated to cover all three AES key lengths, namely 128 bits, 192 bits and 256 bits. On-the-fly generation of the round keys required during decryption is also possible. The general, flexible and multi-functional nature of the approach described contrasts with previous designs which, to date, have been focused on specific implementations. The presented ideas are demonstrated by implementation in FPGA technology. However, the architecture and IP cores derived from this are easily migratable to other silicon technologies including ASIC and PLD and are capable of covering a wide range of modem communication systems cryptographic requirements. Moreover, the designs produced have a gate count and throughput comparable with or better than the previous one-off solutions.
Resumo:
The design and implementation of a programmable cyclic redundancy check (CRC) computation circuit architecture, suitable for deployment in network related system-on-chips (SoCs) is presented. The architecture has been designed to be field reprogrammable so that it is fully flexible in terms of the polynomial deployed and the input port width. The circuit includes an embedded configuration controller that has a low reconfiguration time and hardware cost. The circuit has been synthesised and mapped to 130-nm UMC standard cell [application-specific integrated circuit (ASIC)] technology and is capable of supporting line speeds of 5 Gb/s. © 2006 IEEE.
Resumo:
A new domain-specific, reconfigurable system-on-a-chip (SoC) architecture is proposed for video motion estimation. This has been designed to cover most of the common block-based video coding standards, including MPEG-2, MPEG-4, H.264, WMV-9 and AVS. The architecture exhibits simple control, high throughput and relatively low hardware cost when compared with existing circuits. It can also easily handle flexible search ranges without any increase in silicon area and can be configured prior to the start of the motion estimation process for a specific standard. The computational rates achieved make the circuit suitable for high-end video processing applications, such as HDTV. Silicon design studies indicate that circuits based on this approach incur only a relatively small penalty in terms of power dissipation and silicon area when compared with implementations for specific standards. Indeed, the cost/performance achieved exceeds that of existing but specific solutions and greatly exceeds that of general purpose field programmable gate array (FPGA) designs.
Resumo:
A queue manager (QM) is a core traffic management (TM) function used to provide per-flow queuing in access andmetro networks; however current designs have limited scalability. An on-demand QM (OD-QM) which is part of a new modular field-programmable gate-array (FPGA)-based TM is presented that dynamically maps active flows to the available physical resources; its scalability is derived from exploiting the observation that there are only a few hundred active flows in a high speed network. Simulations with real traffic show that it is a scalable, cost-effective approach that enhances per-flow queuing performance, thereby allowing per-flow QM without the need for extra external memory at speeds up to 10 Gbps. It utilizes 2.3%–16.3% of a Xilinx XC5VSX50t FPGA and works at 111 MHz.
Resumo:
MinneSPEC proposes reduced input sets that microprocessor designers can use to model representative short-running workloads. A four-step methodology verifies the program behavior similarity of these input sets to reference sets.
Resumo:
In this paper, a hardware solution for packet classification based on multi-fields is presented. The proposed scheme focuses on a new architecture based on the decomposition method. A hash circuit is used in order to reduce the memory space required for the Recursive Flow Classification (RFC) algorithm. The implementation results show that the proposed architecture achieves significant performance advantage that is comparable to that of some well-known algorithms. The solution is based on Altera Stratix III FPGA technology.
Resumo:
A novel bit-level systolic array architecture for implementing bit-parallel IIR filter sections is presented. The authors have shown previously how the fundamental obstacle of pipeline latency in recursive structures can be overcome by the use of redundant arithmetic in combination with bit-level feedback. These ideas are extended by optimizing the degree of redundancy used in different parts of the circuit and combining redundant circuit techniques with those of conventional arithmetic. The resultant architecture offers significant improvements in hardware complexity and throughput rate.
Resumo:
A bit-level systolic array system for performing a binary tree Vector Quantization codebook search is described. This consists of a linear chain of regular VLSI building blocks and exhibits data rates suitable for a wide range of real-time applications. A technique is described which reduces the computation required at each node in the binary tree to that of a single inner product operation. This method applies to all the common distortion measures (including the Euclidean distance, the Weighted Euclidean distance and the Itakura-Saito distortion measure) and significantly reduces the hardware required to implement the tree search system. © 1990 Kluwer Academic Publishers.