260 resultados para Programmable array logic
Resumo:
Dynamic power consumption is very dependent on interconnect, so clever mapping of digital signal processing algorithms to parallelised realisations with data locality is vital. This is a particular problem for fast algorithm implementations where typically, designers will have sacrificed circuit structure for efficiency in software implementation. This study outlines an approach for reducing the dynamic power consumption of a class of fast algorithms by minimising the index space separation; this allows the generation of field programmable gate array (FPGA) implementations with reduced power consumption. It is shown how a 50% reduction in relative index space separation results in a measured power gain of 36 and 37% over a Cooley-Tukey Fast Fourier Transform (FFT)-based solution for both actual power measurements for a Xilinx Virtex-II FPGA implementation and circuit measurements for a Xilinx Virtex-5 implementation. The authors show the generality of the approach by applying it to a number of other fast algorithms namely the discrete cosine, the discrete Hartley and the Walsh-Hadamard transforms.
Resumo:
Architectures and methods for the rapid design of silicon cores for implementing discrete wavelet transforms over a wide range of specifications are described. These architectures are efficient, modular, scalable, and cover orthonormal and biorthogonal wavelet transform families. They offer efficient hardware utilization by exploiting a number of core wavelet filter properties and allow the creation of silicon designs that are highly parameterized, including in terms of wavelet type and wordlengths. Control circuitry is embedded within these systems allowing them to be cascaded for any desired level of decomposition without any interface glue logic. The time to produce chip designs for a specific wavelet application is typically less than a day and these are comparable in area and performance to handcrafted designs. They are also portable across a wide range of silicon foundries and suitable for field programmable gate array and programmable logic data implementation. The approach described has also been extended to wavelet packet transforms.
Resumo:
Side-channel attacks (SCA) threaten electronic cryptographic devices and can be carried out by monitoring the physical characteristics of security circuits. Differential Power Analysis (DPA) is one the most widely studied side-channel attacks. Numerous countermeasure techniques, such as Random Delay Insertion (RDI), have been proposed to reduce the risk of DPA attacks against cryptographic devices. The RDI technique was first proposed for microprocessors but it was shown to be unsuccessful when implemented on smartcards as it was vulnerable to a variant of the DPA attack known as the Sliding-Window DPA attack.Previous research by the authors investigated the use of the RDI countermeasure for Field Programmable Gate Array (FPGA) based cryptographic devices. A split-RDI technique wasproposed to improve the security of the RDI countermeasure. A set of critical parameters wasalso proposed that could be utilized in the design stage to optimize a security algorithm designwith RDI in terms of area, speed and power. The authors also showed that RDI is an efficientcountermeasure technique on FPGA in comparison to other countermeasures.In this article, a new RDI logic design is proposed that can be used to cost-efficiently implementRDI on FPGA devices. Sliding-Window DPA and realignment attacks, which were shown to beeffective against RDI implemented on smartcard devices, are performed on the improved RDIFPGA implementation. We demonstrate that these attacks are unsuccessful and we also proposea realignment technique that can be used to demonstrate the weakness of RDI implementations.
Resumo:
A queue manager (QM) is a core traffic management (TM) function used to provide per-flow queuing in access andmetro networks; however current designs have limited scalability. An on-demand QM (OD-QM) which is part of a new modular field-programmable gate-array (FPGA)-based TM is presented that dynamically maps active flows to the available physical resources; its scalability is derived from exploiting the observation that there are only a few hundred active flows in a high speed network. Simulations with real traffic show that it is a scalable, cost-effective approach that enhances per-flow queuing performance, thereby allowing per-flow QM without the need for extra external memory at speeds up to 10 Gbps. It utilizes 2.3%–16.3% of a Xilinx XC5VSX50t FPGA and works at 111 MHz.
Resumo:
The most promising way to maintain reliable data transfer across the rapidly fluctuating channels used by next generation multiple-input multiple-output communications schemes is to exploit run-time variable modulation and antenna configurations. This demands that the baseband signal processing architectures employed in the communications terminals must provide low cost and high performance with runtime reconfigurability. We present a softcore-processor based solution to this issue, and show for the first time, that such programmable architectures can enable real-time data operation for cutting-edge standards
such as 802.11n; furthermore, by exploiting deep processing pipelines and interleaved task execution, the cost and performance of these architectures is shown to be on a par with traditional dedicated circuit based solutions. We believe this to be the first such programmable architecture to achieve this, and the combination of implementation efficiency and programmability makes this implementation style the most promising approach for hosting such dynamic architectures.
Resumo:
Introduction Juvenile idiopathic arthritis (JIA) is a heterogeneous disease characterized by chronic joint inflammation of unknown cause in children. JIA is an autoimmune disease and small numbers of auto-antibodies have been reported in JIA patients. The identification of antibody markers could improve the existing clinical management of patients. Methods A pilot study was performed on the application of a high-throughput platform, nucleic acid programmable protein arrays (NAPPA), to assess the levels of antibodies present in the systemic circulation and synovial joint of a small cohort of juvenile arthritis patients. Plasma and synovial fluid from ten JIA patients was screened for antibodies against 768 proteins on NAPPA. Results Quantitative reproducibility of NAPPA was demonstrated with >0.95 intra- and inter- array correlations. A strong correlation was also observed for the levels of antibodies between plasma and synovial fluid across the study cohort (r=0.96). Differences in the levels of 18 antibodies were revealed between sample types across all patients. Patients were segregated into two clinical subtypes with distinct antibody signatures by unsupervised hierarchical cluster analysis. Conclusions NAPPA provides a high-throughput quantitatively reproducible platform to screen for disease specific autoantibodies at the proteome level on a microscope slide. The strong correlation between the circulating antibody levels and those of the inflamed joint represents a novel finding and provides confidence to use plasma for discovery of autoantibodies in JIA, thus circumventing the challenges associated with joint aspiration. We expect that autoantibody profiling of JIA patients on NAPPA could yield antibody markers that can act as criteria to stratify patients, predict outcomes and understand disease etiology at the molecular level.
Resumo:
The emergence of programmable logic devices as processing platforms for digital signal processing applications poses challenges concerning rapid implementation and high level optimization of algorithms on these platforms. This paper describes Abhainn, a rapid implementation methodology and toolsuite for translating an algorithmic expression of the system to a working implementation on a heterogeneous multiprocessor/field programmable gate array platform, or a standalone system on programmable chip solution. Two particular focuses for Abhainn are the automated but configurable realisation of inter-processor communuication fabrics, and the establishment of novel dedicated hardware component design methodologies allowing algorithm level transformation for system optimization. This paper outlines the approaches employed in both these particular instances.
Resumo:
A new high performance, programmable image processing chip targeted at video and HDTV applications is described. This was initially developed for image small object recognition but has much broader functional application including 1D and 2D FIR filtering as well as neural network computation. The core of the circuit is made up of an array of twenty one multiplication-accumulation cells based on systolic architecture. Devices can be cascaded to increase the order of the filter both vertically and horizontally. The chip has been fabricated in a 0.6 µ, low power CMOS technology and operates on 10 bit input data at over 54 Megasamples per second. The introduction gives some background to the chip design and highlights that there are few other comparable devices. Section 2 gives a brief introduction to small object detection. The chip architecture and the chip design will be described in detail in the later sections.
Resumo:
The paper presents a state-of-the-art commercial demonstrator chip for infinite impulse response (IIR) filtering. The programmable IIR filter chip contains eight multiplier/accumulators that can be configured in one of five different modes to implement up to a 16th-order IIR filter. The multiply-accumulate block is based on a highly regular systolic array architecture and uses a redundant number system to overcome problems of pipelining in the feedback loop. The chip has been designed using the GEC Plessey Semiconductors CLA 78000 series gate array, operates on 16-bit two's complement data and has a clock speed of 30 MHz. Issues such as overflow detection and design for testability have also been addressed and are described.
Resumo:
Current variation aware design methodologies, tuned for worst-case scenarios, are becoming increasingly pessimistic from the perspective of power and performance. A good example of such pessimism is setting the refresh rate of DRAMs according to the worst-case access statistics, thereby resulting in very frequent refresh cycles, which are responsible for the majority of the standby power consumption of these memories. However, such a high refresh rate may not be required, either due to extremely low probability of the actual occurrence of such a worst-case, or due to the inherent error resilient nature of many applications that can tolerate a certain number of potential failures. In this paper, we exploit and quantify the possibilities that exist in dynamic memory design by shifting to the so-called approximate computing paradigm in order to save power and enhance yield at no cost. The statistical characteristics of the retention time in dynamic memories were revealed by studying a fabricated 2kb CMOS compatible embedded DRAM (eDRAM) memory array based on gain-cells. Measurements show that up to 73% of the retention power can be saved by altering the refresh time and setting it such that a small number of failures is allowed. We show that these savings can be further increased by utilizing known circuit techniques, such as body biasing, which can help, not only in extending, but also in preferably shaping the retention time distribution. Our approach is one of the first attempts to access the data integrity and energy tradeoffs achieved in eDRAMs for utilizing them in error resilient applications and can prove helpful in the anticipated shift to approximate computing.