107 resultados para Field-Programmable Gate Array (FPGA)
Resumo:
Dynamic power consumption is very dependent on interconnect, so clever mapping of digital signal processing algorithms to parallelised realisations with data locality is vital. This is a particular problem for fast algorithm implementations where typically, designers will have sacrificed circuit structure for efficiency in software implementation. This study outlines an approach for reducing the dynamic power consumption of a class of fast algorithms by minimising the index space separation; this allows the generation of field programmable gate array (FPGA) implementations with reduced power consumption. It is shown how a 50% reduction in relative index space separation results in a measured power gain of 36 and 37% over a Cooley-Tukey Fast Fourier Transform (FFT)-based solution for both actual power measurements for a Xilinx Virtex-II FPGA implementation and circuit measurements for a Xilinx Virtex-5 implementation. The authors show the generality of the approach by applying it to a number of other fast algorithms namely the discrete cosine, the discrete Hartley and the Walsh-Hadamard transforms.
Resumo:
High-speed field-programmable gate array (FPGA) implementations of an adaptive least mean square (LMS) filter with application in an electronic support measures (ESM) digital receiver, are presented. They employ "fine-grained" pipelining, i.e., pipelining within the processor and result in an increased output latency when used in the LMS recursive system. Therefore, the major challenge is to maintain a low latency output whilst increasing the pipeline stage in the filter for higher speeds. Using the delayed LMS (DLMS) algorithm, fine-grained pipelined FPGA implementations using both the direct form (DF) and the transposed form (TF) are considered and compared. It is shown that the direct form LMS filter utilizes the FPGA resources more efficiently thereby allowing a 120 MHz sampling rate.
Resumo:
A queue manager (QM) is a core traffic management (TM) function used to provide per-flow queuing in access andmetro networks; however current designs have limited scalability. An on-demand QM (OD-QM) which is part of a new modular field-programmable gate-array (FPGA)-based TM is presented that dynamically maps active flows to the available physical resources; its scalability is derived from exploiting the observation that there are only a few hundred active flows in a high speed network. Simulations with real traffic show that it is a scalable, cost-effective approach that enhances per-flow queuing performance, thereby allowing per-flow QM without the need for extra external memory at speeds up to 10 Gbps. It utilizes 2.3%–16.3% of a Xilinx XC5VSX50t FPGA and works at 111 MHz.
Resumo:
Sphere Decoding (SD) is a highly effective detection technique for Multiple-Input Multiple-Output (MIMO) wireless communications receivers, offering quasi-optimal accuracy with relatively low computational complexity as compared to the ideal ML detector. Despite this, the computational demands of even low-complexity SD variants, such as Fixed Complexity SD (FSD), remains such that implementation on modern software-defined network equipment is a highly challenging process, and indeed real-time solutions for MIMO systems such as 4 4 16-QAM 802.11n are unreported. This paper overcomes this barrier. By exploiting large-scale networks of fine-grained softwareprogrammable processors on Field Programmable Gate Array (FPGA), a series of unique SD implementations are presented, culminating in the only single-chip, real-time quasi-optimal SD for 44 16-QAM 802.11n MIMO. Furthermore, it demonstrates that the high performance software-defined architectures which enable these implementations exhibit cost comparable to dedicated circuit architectures.
Resumo:
In this paper, a new field-programmable gate array (FPGA) identification generator circuit is introduced based on physically unclonable function (PUF) technology. The new identification generator is able to convert flip-flop delay path variations to unique n-bit digital identifiers (IDs), while requiring only a single slice per ID bit by using 1-bit ID cells formed as hard-macros. An exemplary 128-bit identification generator is implemented on ten Xilinx Spartan-6 FPGA devices. Experimental results show an uniqueness of 48.52%, and reliability of 92.41% over a 25°C to 70°C temperature range and 10% fluctuation in supply voltage
Resumo:
The Field Programmable Gate Array (FPGA) implementation of the commonly used Histogram of Oriented Gradients (HOG) algorithm is explored. The HOG algorithm is employed to extract features for object detection. A key focus has been to explore the use of a new FPGA-based processor which has been targeted at image processing. The paper gives details of the mapping and scheduling factors that influence the performance and the stages that were undertaken to allow the algorithm to be deployed on FPGA hardware, whilst taking into account the specific IPPro architecture features. We show that multi-core IPPro performance can exceed that of against state-of-the-art FPGA designs by up to 3.2 times with reduced design and implementation effort and increased flexibility all on a low cost, Zynq programmable system.
Resumo:
The increasing design complexity associated with modern Field Programmable Gate Array (FPGA) has prompted the emergence of 'soft'-programmable processors which attempt to replace at least part of the custom circuit design problem with a problem of programming parallel processors. Despite substantial advances in this technology, its performance and resource efficiency for computationally complex operations remains in doubt. In this paper we present the first recorded implementation of a softcore Fast-Fourier Transform (FFT) on Xilinx Virtex FPGA technology. By employing a streaming processing architecture, we show how it is possible to achieve architectures which offer 1.1 GSamples/s throughput and up to 19 times speed-up against the Xilinx Radix-2 FFT dedicated circuit with comparable cost.
Resumo:
Homomorphic encryption offers potential for secure cloud computing. However due to the complexity of homomorphic encryption schemes, performance of implemented schemes to date have been unpractical. This work investigates the use of hardware, specifically Field Programmable Gate Array (FPGA) technology, for implementing the building blocks involved in somewhat and fully homomorphic encryption schemes in order to assess the practicality of such schemes. We concentrate on the selection of a suitable multiplication algorithm and hardware architecture for large integer multiplication, one of the main bottlenecks in many homomorphic encryption schemes. We focus on the encryption step of an integer-based fully homomorphic encryption (FHE) scheme. We target the DSP48E1 slices available on Xilinx Virtex 7 FPGAs to ascertain whether the large integer multiplier within the encryption step of a FHE scheme could fit on a single FPGA device. We find that, for toy size parameters for the FHE encryption step, the large integer multiplier fits comfortably within the DSP48E1 slices, greatly improving the practicality of the encryption step compared to a software implementation. As multiplication is an important operation in other FHE schemes, a hardware implementation using this multiplier could also be used to improve performance of these schemes.
Resumo:
With security and surveillance, there is an increasing need to process image data efficiently and effectively either at source or in a large data network. Whilst a Field-Programmable Gate Array (FPGA) has been seen as a key technology for enabling this, the design process has been viewed as problematic in terms of the time and effort needed for implementation and verification. The work here proposes a different approach of using optimized FPGA-based soft-core processors which allows the user to exploit the task and data level parallelism to achieve the quality of dedicated FPGA implementations whilst reducing design time. The paper also reports some preliminary
progress on the design flow to program the structure. An implementation for a Histogram of Gradients algorithm is also reported which shows that a performance of 328 fps can be achieved with this design approach, whilst avoiding the long design time, verification and debugging steps associated with conventional FPGA implementations.
Resumo:
A hardware performance analysis of the SHACAL-2 encryption algorithm is presented in this paper. SHACAL-2 was one of four symmetric key algorithms chosen in the New European Schemes for Signatures, Integrity and Encryption (NESSIE) initiative in 2003. The paper describes a fully pipelined encryption SHACAL-2 architecture implemented on a Xilinx Field Programmable Gate Array (FPGA) device that achieves a throughput of over 25 Gbps. This is the fastest private key encryption algorithm architecture currently available. The SHACAL-2 decryption algorithm is also defined in the paper as it was not provided in the NESSIE submission.
Resumo:
A new domain-specific, reconfigurable system-on-a-chip (SoC) architecture is proposed for video motion estimation. This has been designed to cover most of the common block-based video coding standards, including MPEG-2, MPEG-4, H.264, WMV-9 and AVS. The architecture exhibits simple control, high throughput and relatively low hardware cost when compared with existing circuits. It can also easily handle flexible search ranges without any increase in silicon area and can be configured prior to the start of the motion estimation process for a specific standard. The computational rates achieved make the circuit suitable for high-end video processing applications, such as HDTV. Silicon design studies indicate that circuits based on this approach incur only a relatively small penalty in terms of power dissipation and silicon area when compared with implementations for specific standards. Indeed, the cost/performance achieved exceeds that of existing but specific solutions and greatly exceeds that of general purpose field programmable gate array (FPGA) designs.
Resumo:
Side-channel attacks (SCA) threaten electronic cryptographic devices and can be carried out by monitoring the physical characteristics of security circuits. Differential Power Analysis (DPA) is one the most widely studied side-channel attacks. Numerous countermeasure techniques, such as Random Delay Insertion (RDI), have been proposed to reduce the risk of DPA attacks against cryptographic devices. The RDI technique was first proposed for microprocessors but it was shown to be unsuccessful when implemented on smartcards as it was vulnerable to a variant of the DPA attack known as the Sliding-Window DPA attack.Previous research by the authors investigated the use of the RDI countermeasure for Field Programmable Gate Array (FPGA) based cryptographic devices. A split-RDI technique wasproposed to improve the security of the RDI countermeasure. A set of critical parameters wasalso proposed that could be utilized in the design stage to optimize a security algorithm designwith RDI in terms of area, speed and power. The authors also showed that RDI is an efficientcountermeasure technique on FPGA in comparison to other countermeasures.In this article, a new RDI logic design is proposed that can be used to cost-efficiently implementRDI on FPGA devices. Sliding-Window DPA and realignment attacks, which were shown to beeffective against RDI implemented on smartcard devices, are performed on the improved RDIFPGA implementation. We demonstrate that these attacks are unsuccessful and we also proposea realignment technique that can be used to demonstrate the weakness of RDI implementations.
Resumo:
A novel hardware architecture for elliptic curve cryptography (ECC) over GF(p) is introduced. This can perform the main prime field arithmetic functions needed in these cryptosystems including modular inversion and multiplication. This is based on a new unified modular inversion algorithm that offers considerable improvement over previous ECC techniques that use Fermat's Little Theorem for this operation. The processor described uses a full-word multiplier which requires much fewer clock cycles than previous methods, while still maintaining a competitive critical path delay. The benefits of the approach have been demonstrated by utilizing these techniques to create a field-programmable gate array (FPGA) design. This can perform a 256-bit prime field scalar point multiplication in 3.86 ms, the fastest FPGA time reported to date. The ECC architecture described can also perform four different types of modular inversion, making it suitable for use in many different ECC applications. © 2006 IEEE.
Resumo:
Software-programmable `soft' processors have shown tremendous potential for efficient realisation of high performance signal processing operations on Field Programmable Gate Array (FPGA), whilst lowering the design burden by avoiding the need to design fine-grained custom circuit archi-tectures. However, the complex data access patterns, high memory bandwidth and computational requirements of sliding window applications, such as Motion Estimation (ME) and Matrix Multiplication (MM), lead to low performance, inefficient soft processor realisations. This paper resolves this issue, showing how by adding support for block data addressing and accelerators for high performance loop execution, performance and resource efficiency over four times better than current best-in-class metrics can be achieved. In addition, it demonstrates the first recorded real-time soft ME estimation realisation for H.263 systems.