241 resultados para Bretton woods
Resumo:
A power and resource efficient ‘dynamic-range utilisation’ technique to increase operational capacity of DSP IP cores by exploiting redundancy in the data epresentation of sampled analogue input data, is presented. By cleverly partitioning dynamic-range into separable processing threads, several data streams are computed concurrently on the same hardware. Unlike existing techniques which act solely to reduce power consumption due to sign extension, here the dynamic range is exploited to increase operational capacity while still achieving reduced power consumption. This extends an existing system-level, power efficient framework for the design of low power DSP IP cores, which when applied to the design of an FFT IP core in a digital receiver system gives an architecture requiring 50% fewer multipliers, 12% fewer slices and 51%-56% less power.
Resumo:
Exploiting the underutilisation of variable-length DSP algorithms during normal operation is vital, when seeking to maximise the achievable functionality of an application within peak power budget. A system level, low power design methodology for FPGA-based, variable length DSP IP cores is presented. Algorithmic commonality is identified and resources mapped with a configurable datapath, to increase achievable functionality. It is applied to a digital receiver application where a 100% increase in operational capacity is achieved in certain modes without significant power or area budget increases. Measured results show resulting architectures requires 19% less peak power, 33% fewer multipliers and 12% fewer slices than existing architectures.
Resumo:
Generation of hardware architectures directly from dataflow representations is increasingly being considered as research moves toward system level design methodologies. Creation of networks of IP cores to implement actor functionality is a common approach to the problem, but often the memory sub-systems produced using these techniques are inefficiently utilised. This paper explores some of the issues in terms of memory organisation and accesses when developing systems from these high level representations. Using a template matching design study, challenges such as modelling memory reuse and minimising buffer requirements are examined, yielding results with significantly less memory requirements and costly off-chip memory accesses.
Resumo:
Hardware synthesis from dataflow graphs of signal processing systems is a growing research area as focus shifts to high level design methodologies. For data intensive systems, dataflow based synthesis can lead to an inefficient usage of memory due to the restrictive nature of synchronous dataflow and its inability to easily model data reuse. This paper explores how dataflow graph changes can be used to drive both the on-chip and off-chip memory organisation and how these memory architectures can be mapped to a hardware implementation. By exploiting the data reuse inherent to many image processing algorithms and by creating memory hierarchies, off-chip memory bandwidth can be reduced by a factor of a thousand from the original dataflow graph level specification of a motion estimation algorithm, with a minimal increase in memory size. This analysis is verified using results gathered from implementation of the motion estimation algorithm on a Xilinx Virtex-4 FPGA, where the delay between the memories and processing elements drops from 14.2 ns down to 1.878 ns through the refinement of the memory architecture. Care must be taken when modeling these algorithms however, as inefficiencies in these models can be easily translated into overuse of hardware resources.
Resumo:
Galactic bulge planetary nebulae show evidence of mixed chemistry with emission from both silicate dust and polycyclic aromatic hydrocarbons (PAHs). This mixed chemistry is unlikely to be related to carbon dredge-up, as third dredge-up is not expected to occur in the low-mass bulge stars. We show that the phenomenon is widespread and is seen in 30 nebulae out of 40 of our sample, selected on the basis of their infrared flux. Hubble Space Telescope (HST) images and Ultraviolet and Visual Echelle Spectrograph (UVES) spectra show that the mixed chemistry is not related to the presence of emission-line stars, as it is in the Galactic disc population. We also rule out interaction with the interstellar medium (ISM) as origin of the PAHs. Instead, a strong correlation is found with morphology and the presence of a dense torus. A chemical model is presented which shows that hydrocarbon chains can form within oxygen-rich gas through gas-phase chemical reactions. The model predicts two layers, one at A_V~ 1.5, where small hydrocarbons form from reactions with C+, and one at A_V~ 4, where larger chains (and by implication, PAHs) form from reactions with neutral, atomic carbon. These reactions take place in a mini-photon-dominated region (PDR). We conclude that the mixed-chemistry phenomenon occurring in the Galactic bulge planetary nebulae is best explained through hydrocarbon chemistry in an ultraviolet (UV)-irradiated, dense torus.
Resumo:
Dynamic power consumption is very dependent on interconnect, so clever mapping of digital signal processing algorithms to parallelised realisations with data locality is vital. This is a particular problem for fast algorithm implementations where typically, designers will have sacrificed circuit structure for efficiency in software implementation. This study outlines an approach for reducing the dynamic power consumption of a class of fast algorithms by minimising the index space separation; this allows the generation of field programmable gate array (FPGA) implementations with reduced power consumption. It is shown how a 50% reduction in relative index space separation results in a measured power gain of 36 and 37% over a Cooley-Tukey Fast Fourier Transform (FFT)-based solution for both actual power measurements for a Xilinx Virtex-II FPGA implementation and circuit measurements for a Xilinx Virtex-5 implementation. The authors show the generality of the approach by applying it to a number of other fast algorithms namely the discrete cosine, the discrete Hartley and the Walsh-Hadamard transforms.
Resumo:
A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.
Resumo:
As a potential alternative to CMOS technology, QCA provides an interesting paradigm in both communication and computation. However, QCAs unique four-phase clocking scheme and timing constraints present serious timing issues for interconnection and feedback. In this work, a cut-set retiming design procedure is proposed to resolve these QCA timing issues. The proposed design procedure can accommodate QCAs unique characteristics by performing delay-transfer and time-scaling to reallocate the existing delays so as to achieve efficient clocking zone assignment. Cut-set retiming makes it possible to effectively design relatively complex QCA circuits that include feedback. It utilizes the similar characteristics of synchronization, deep pipelines and local interconnections common to both QCA and systolic architectures. As a case study, a systolic Montgomery modular multiplier is designed to illustrate the procedure. Furthermore, a nonsystolic architecture, an S27 benchmark circuit, is designed and compared with previous designs. The comparison shows that the cut-set retiming method achieves a more efficient design, with a reduction of 22%, 44%, and 46% in terms of cell count, area, and latency, respectively.
Resumo:
The development of high performance, low computational complexity detection algorithms is a key challenge for real-time Multiple-Input Multiple-Output (MIMO) communication system design. The Fixed-Complexity Sphere Decoder (FSD) algorithm is one of the most promising approaches, enabling quasi-ML decoding accuracy and high performance implementation due to its deterministic, highly parallel structure. However, it suffers from exponential growth in computational complexity as the number of MIMO transmit antennas increases, critically limiting its scalability to larger MIMO system topologies. In this paper, we present a solution to this problem by applying a novel cutting protocol to the decoding tree of a real-valued FSD algorithm. The new Real-valued Fixed-Complexity Sphere Decoder (RFSD) algorithm derived achieves similar quasi-ML decoding performance as FSD, but with an average 70% reduction in computational complexity, as we demonstrate from both theoretical and implementation perspectives for Quadrature Amplitude Modulation (QAM)-MIMO systems.