144 resultados para FPGA, VHDL, Picoblaze, SERDES
Resumo:
Software-programmable `soft' processors have shown tremendous potential for efficient realisation of high performance signal processing operations on Field Programmable Gate Array (FPGA), whilst lowering the design burden by avoiding the need to design fine-grained custom circuit archi-tectures. However, the complex data access patterns, high memory bandwidth and computational requirements of sliding window applications, such as Motion Estimation (ME) and Matrix Multiplication (MM), lead to low performance, inefficient soft processor realisations. This paper resolves this issue, showing how by adding support for block data addressing and accelerators for high performance loop execution, performance and resource efficiency over four times better than current best-in-class metrics can be achieved. In addition, it demonstrates the first recorded real-time soft ME estimation realisation for H.263 systems.
Resumo:
Pre-processing (PP) of received symbol vector and channel matrices is an essential pre-requisite operation for Sphere Decoder (SD)-based detection of Multiple-Input Multiple-Output (MIMO) wireless systems. PP is a highly complex operation, but relative to the total SD workload it represents a relatively small fraction of the overall computational cost of detecting an OFDM MIMO frame in standards such as 802.11n. Despite this, real-time PP architectures are highly inefficient, dominating the resource cost of real-time SD architectures. This paper resolves this issue. By reorganising the ordering and QR decomposition sub operations of PP, we describe a Field Programmable Gate Array (FPGA)-based PP architecture for the Fixed Complexity Sphere Decoder (FSD) applied to 4 × 4 802.11n MIMO which reduces resource cost by 50% as compared to state-of-the-art solutions whilst maintaining real-time performance.
Resumo:
Field programmable gate array (FPGA) technology is a powerful platform for implementing computationally complex, digital signal processing (DSP) systems. Applications that are multi-modal, however, are designed for worse case conditions. In this paper, genetic sequencing techniques are applied to give a more sophisticated decomposition of the algorithmic variations, thus allowing an unified hardware architecture which gives a 10-25% area saving and 15% power saving for a digital radar receiver.
Resumo:
Lattice-based cryptography has gained credence recently as a replacement for current public-key cryptosystems, due to its quantum-resilience, versatility, and relatively low key sizes. To date, encryption based on the learning with errors (LWE) problem has only been investigated from an ideal lattice standpoint, due to its computation and size efficiencies. However, a thorough investigation of standard lattices in practice has yet to be considered. Standard lattices may be preferred to ideal lattices due to their stronger security assumptions and less restrictive parameter selection process. In this paper, an area-optimised hardware architecture of a standard lattice-based cryptographic scheme is proposed. The design is implemented on a FPGA and it is found that both encryption and decryption fit comfortably on a Spartan-6 FPGA. This is the first hardware architecture for standard lattice-based cryptography reported in the literature to date, and thus is a benchmark for future implementations.
Additionally, a revised discrete Gaussian sampler is proposed which is the fastest of its type to date, and also is the first to investigate the cost savings of implementing with lamda_2-bits of precision. Performance results are promising in comparison to the hardware designs of the equivalent ring-LWE scheme, which in addition to providing a stronger security proof; generate 1272 encryptions per second and 4395 decryptions per second.
Resumo:
Power capping is a fundamental method for reducing the energy consumption of a wide range of modern computing environments, ranging from mobile embedded systems to datacentres. Unfortunately, maximising performance and system efficiency under static power caps remains challenging, while maximising performance under dynamic power caps has been largely unexplored. We present an adaptive power capping method that reduces the power consumption and maximizes the performance of heterogeneous SoCs for mobile and server platforms. Our technique combines power capping with coordinated DVFS, data partitioning and core allocations on a heterogeneous SoC with ARM processors and FPGA resources. We design our framework as a run-time system based on OpenMP and OpenCL to utilise the heterogeneous resources. We evaluate it through five data-parallel benchmarks on the Xilinx SoC which allows fully voltage and frequency control. Our experiments show a significant performance boost of 30% under dynamic power caps with concurrent execution on ARM and FPGA, compared to a naive separate approach.
Resumo:
Cryptographic algorithms have been designed to be computationally secure, however it has been shown that when they are implemented in hardware, that these devices leak side channel information that can be used to mount an attack that recovers the secret encryption key. In this paper an overlapping window power spectral density (PSD) side channel attack, targeting an FPGA device running the Advanced Encryption Standard is proposed. This improves upon previous research into PSD attacks by reducing the amount of pre-processing (effort) required. It is shown that the proposed overlapping window method requires less processing effort than that of using a sliding window approach, whilst overcoming the issues of sampling boundaries. The method is shown to be effective for both aligned and misaligned data sets and is therefore recommended as an improved approach in comparison with existing time domain based correlation attacks.
Resumo:
FPGAs and GPUs are often used when real-time performance in video processing is required. An accelerated processor is chosen based on task-specific priorities (power consumption, processing time and detection accuracy), and this decision is normally made once at design time. All three characteristics are important, particularly in battery-powered systems. Here we propose a method for moving selection of processing platform from a single design-time choice to a continuous run time one.We implement Histogram of Oriented Gradients (HOG) detectors for cars and people and Mixture of Gaussians (MoG) motion detectors running across FPGA, GPU and CPU in a heterogeneous system. We use this to detect illegally parked vehicles in urban scenes. Power, time and accuracy information for each detector is characterised. An anomaly measure is assigned to each detected object based on its trajectory and location, when compared to learned contextual movement patterns. This drives processor and implementation selection, so that scenes with high behavioural anomalies are processed with faster but more power hungry implementations, but routine or static time periods are processed with power-optimised, less accurate, slower versions. Real-time performance is evaluated on video datasets including i-LIDS. Compared to power-optimised static selection, automatic dynamic implementation mapping is 10% more accurate but draws 12W extra power in our testbed desktop system.
Resumo:
This paper proposes a JPEG-2000 compliant architecture capable of computing the 2 -D Inverse Discrete Wavelet Transform. The proposed architecture uses a single processor and a row-based schedule to minimize control and routing complexity and to ensure that processor utilization is kept at 100%. The design incorporates the handling of borders through the use of symmetric extension. The architecture has been implemented on the Xilinx Virtex2 FPGA.
Resumo:
As the development of a viable quantum computer nears, existing widely used public-key cryptosystems, such as RSA, will no longer be secure. Thus, significant effort is being invested into post-quantum cryptography (PQC). Lattice-based cryptography (LBC) is one such promising area of PQC, which offers versatile, efficient, and high performance security services. However, the vulnerabilities of these implementations against side-channel attacks (SCA) remain significantly understudied. Most, if not all, lattice-based cryptosystems require noise samples generated from a discrete Gaussian distribution, and a successful timing analysis attack can render the whole cryptosystem broken, making the discrete Gaussian sampler the most vulnerable module to SCA. This research proposes countermeasures against timing information leakage with FPGA-based designs of the CDT-based discrete Gaussian samplers with constant response time, targeting encryption and signature scheme parameters. The proposed designs are compared against the state-of-the-art and are shown to significantly outperform existing implementations. For encryption, the proposed sampler is 9x faster in comparison to the only other existing time-independent CDT sampler design. For signatures, the first time-independent CDT sampler in hardware is proposed.