955 resultados para Boolean Computations
Resumo:
The initial part of this paper reviews the early challenges (c 1980) in achieving real-time silicon implementations of DSP computations. In particular, it discusses research on application specific architectures, including bit level systolic circuits that led to important advances in achieving the DSP performance levels then required. These were many orders of magnitude greater than those achievable using programmable (including early DSP) processors, and were demonstrated through the design of commercial digital correlator and digital filter chips. As is discussed, an important challenge was the application of these concepts to recursive computations as occur, for example, in Infinite Impulse Response (IIR) filters. An important breakthrough was to show how fine grained pipelining can be used if arithmetic is performed most significant bit (msb) first. This can be achieved using redundant number systems, including carry-save arithmetic. This research and its practical benefits were again demonstrated through a number of novel IIR filter chip designs which at the time, exhibited performance much greater than previous solutions. The architectural insights gained coupled with the regular nature of many DSP and video processing computations also provided the foundation for new methods for the rapid design and synthesis of complex DSP System-on-Chip (SoC), Intellectual Property (IP) cores. This included the creation of a wide portfolio of commercial SoC video compression cores (MPEG2, MPEG4, H.264) for very high performance applications ranging from cell phones to High Definition TV (HDTV). The work provided the foundation for systematic methodologies, tools and design flows including high-level design optimizations based on "algorithmic engineering" and also led to the creation of the Abhainn tool environment for the design of complex heterogeneous DSP platforms comprising processors and multiple FPGAs. The paper concludes with a discussion of the problems faced by designers in developing complex DSP systems using current SoC technology. © 2007 Springer Science+Business Media, LLC.
Resumo:
The highly structured nature of many digital signal processing operations allows these to be directly implemented as regular VLSI circuits. This feature has been successfully exploited in the design of a number of commercial chips, some examples of which are described. While many of the architectures on which such chips are based were originally derived on heuristic basis, there is an increasing interest in the development of systematic design techniques for the direct mapping of computations onto regular VLSI arrays. The purpose of this paper is to show how the the technique proposed by Kung can be readily extended to the design of VLSI signal processing chips where the organisation of computations at the level of individual data bits is of paramount importance. The technique in question allows architectures to be derived using the projection and retiming of data dependence graphs.
Resumo:
Methods are presented for developing synthesizable FFT cores. These are based on a modular approach in which parameterized commutator and processor blocks are cascaded to implement the computations required in many important FFT signal flow graphs. In addition, it is shown how the use of a digital serial data organization can be used to produce systems that offer 100% processor utilization along with reductions in storage requirements. The approach has been used to create generators for the automated synthesis of FFT cores that are portable across a broad range of silicon technologies. Resulting chip designs are competitive with ones created using manual methods but with significant reductions in design times.
Resumo:
Whilst conventional bit level pipelining introduces an m cycle delay, it does allow m separate computations to be processed at throughput rates comparable to that using word level systolic arrays. We concentrate on exploiting this delay and describe a systematic method for the design of high performance multiplexed IIR filters. Two multiply and accumulate structures are identified based on shift-and-add and carry-save data organisations which can be used as building blocks in the design of IIR filters. By replacing the word level multiply and accumulate units in word level systolic structures with their equivalent bit level circuits and introducing latches to ensure correct timing, numerous architectures can be designed that process multiplexed data directly without any additional circuit overhead.
Resumo:
Real time digital signal processing demands high performance implementations of division and square root. This can only be achieved by the design of fast and efficient arithmetic algorithms which address practical VLSI architectural design issues. In this paper, new algorithms for division and square root are described. The new schemes are based on pre-scaling the operands and modifying the classical SRT method such that the result digits and the remainders are computed concurrently and the computations in adjacent rows are overlapped. Consequently, their performance exceeds that of the SRT methods. The hardware cost for higher radices is considerably more than that of the SRT methods but for many applications, this is not prohibitive. A system of equations is presented which enables both an analysis of the method for any radix and the parameters of implementations to be easily determined. This is illustrated for the case of radix 2 and radix 4. In addition, a highly regular array architecture combining the division and square root method is described. © 1994 Kluwer Academic Publishers.
Resumo:
The application of fine-grain pipelining techniques in the design of high-performance wave digital filters (WDFs) is described. The problems of latency in feedback loops can be significantly reduced if computations are organized most significant, as opposed to least significant, bit first and if the results are fed back as soon as they are formed. The result is that chips can be designed which offer significantly higher sampling rates than otherwise can be obtained using conventional methods. How these concepts can be extended to the more challenging problem of WDFs is discussed. It is shown that significant increases in the sampling rate of bit-parallel circuits can be achieved using most significant bit first arithmetic.
Resumo:
In real time digital signal processing, high performance modules for division and square root are essential if many powerful algorithms are to be implemented. In this paper, a new radix 2 algorithms for SRT division and square root are developed. For these new schemes, the result digits and the residuals are computed concurrently and the computations in adjacent rows are overlapped. Consequently, their performance should exceed that of the radix 2 SRT methods. VLSI array architectures to implement the new division and square root schemes are also presented.
Resumo:
Methods are presented for developing synthesizable FFT cores. These are based on a modular approach in which parameterizable blocks are cascaded to implement the computations required across a range of typical FFT signal flow graphs. The underlying architectural approach combines the use of a digital serial data organization with generic commutator blocks to produce systems that offer 100% processor utilization with storage requirements less than previous designs. The approach has been used to create generators for the automated synthesis of FFT cores that are portable across a broad range of silicon technologies. Resulting chip designs are competitive with manual methods but with significant reductions in design times.
Resumo:
The effect of photon frequency redistribution by line branching on mass-loss rates for hot luminous stars is investigated. Monte Carlo simulations are carried out for a range of OB star models which show that previous mass-loss calculations which neglect non-resonance line scattering overestimate mass-loss rates for luminous O stars by ~20 per cent. For luminous B stars the effect is somewhat larger, typically ~50 per cent. A Wolf-Rayet star model is used to investigate line branching in the strong wind limit. In this case the effect of line branching is much greater, giving mass-loss rates that are smaller by a factor ~3 from computations which neglect branching.
Resumo:
We introduce a scheme to reconstruct arbitrary states of networks composed of quantum oscillators-e. g., the motionalstate of trapped ions or the radiation state of coupled cavities. The scheme involves minimal resources and minimal access, in the sense that it (i) requires only the interaction between a one-qubit probe and a single node of the network; (ii) provides the Weyl characteristic function of the network directly from the data, avoiding any tomographic transformation; (iii) involves the tuning of only one coupling parameter. In addition, we show that a number of quantum properties can be extracted without full reconstruction of the state. The scheme can be used for probing quantum simulations of anharmonic many-body systems and quantum computations with continuous variables. Experimental implementation with trapped ions is also discussed and shown to be within reach of current technology.
Resumo:
Modal analysis is a popular approach used in structural dynamic and aeroelastic problems due to its efficiency. The response of a structure is compo
sed of the sum of orthogonal eigenvectors or modeshapes and corresponding modal frequencies. This paper investigates the importance of modeshapes on the aeroelastic response of the Goland wing subject to structural uncertainties. The wing undergoes limit cycle oscillations (LCO) as a result of the inclusion of polynomial stiffness nonlinearities. The LCO computations are performed using a Harmonic Balance approach for speed, the modal properties of the system are extracted from MSC NASTRAN. Variability in both the wing’s structure and the store centre of gravity location is investigated in two cases:- supercritical and subcritical type LCOs. Results show that the LCO behaviour is only sensitive to change in modeshapes when the nature of the modes are changing significantly.
Resumo:
In this paper, we propose a system level design approach considering voltage over-scaling (VOS) that achieves error resiliency using unequal error protection of different computation elements, while incurring minor quality degradation. Depending on user specifications and severity of process variations/channel noise, the degree of VOS in each block of the system is adaptively tuned to ensure minimum system power while providing "just-the-right" amount of quality and robustness. This is achieved, by taking into consideration system level interactions and ensuring that under any change of operating conditions only the "lesscrucial" computations, that contribute less to block/system output quality, are affected. The design methodology applied to a DCT/IDCT system shows large power benefits (up to 69%) at reasonable image quality while tolerating errors induced by varying operating conditions (VOS, process variations, channel noise). Interestingly, the proposed IDCT scheme conceals channel noise at scaled voltages. ©2009 IEEE.
Resumo:
Power dissipation and robustness to process variation have conflicting design requirements. Scaling of voltage is associated with larger variations, while Vdd upscaling or transistor upsizing for parametric-delay variation tolerance can be detrimental for power dissipation. However, for a class of signal-processing systems, effective tradeoff can be achieved between Vdd scaling, variation tolerance, and output quality. In this paper, we develop a novel low-power variation-tolerant algorithm/architecture for color interpolation that allows a graceful degradation in the peak-signal-to-noise ratio (PSNR) under aggressive voltage scaling as well as extreme process variations. This feature is achieved by exploiting the fact that all computations used in interpolating the pixel values do not equally contribute to PSNR improvement. In the presence of Vdd scaling and process variations, the architecture ensures that only the less important computations are affected by delay failures. We also propose a different sliding-window size than the conventional one to improve interpolation performance by a factor of two with negligible overhead. Simulation results show that, even at a scaled voltage of 77% of nominal value, our design provides reasonable image PSNR with 40% power savings. © 2006 IEEE.
Resumo:
2-D Discrete Cosine Transform (DCT) is widely used as the core of digital image and video compression. In this paper, we present a novel DCT architecture that allows aggressive voltage scaling by exploiting the fact that not all intermediate computations are equally important in a DCT system to obtain "good" image quality with Peak Signal to Noise Ratio(PSNR) > 30 dB. This observation has led us to propose a DCT architecture where the signal paths that are less contributive to PSNR improvement are designed to be longer than the paths that are more contributive to PSNR improvement. It should also be noted that robustness with respect to parameter variations and low power operation typically impose contradictory requirements in terms of architecture design. However, the proposed architecture lends itself to aggressive voltage scaling for low-power dissipation even under process parameter variations. Under a scaled supply voltage and/or variations in process parameters, any possible delay errors would only appear from the long paths that are less contributive towards PSNR improvement, providing large improvement in power dissipation with small PSNR degradation. Results show that even under large process variation and supply voltage scaling (0.8V), there is a gradual degradation of image quality with considerable power savings (62.8%) for the proposed architecture when compared to existing implementations in 70 nm process technology.
Resumo:
Power dissipation and tolerance to process variations pose conflicting design requirements. Scaling of voltage is associated with larger variations, while Vdd upscaling or transistor up-sizing for process tolerance can be detrimental for power dissipation. However, for certain signal processing systems such as those used in color image processing, we noted that effective trade-offs can be achieved between Vdd scaling, process tolerance and "output quality". In this paper we demonstrate how these tradeoffs can be effectively utilized in the development of novel low-power variation tolerant architectures for color interpolation. The proposed architecture supports a graceful degradation in the PSNR (Peak Signal to Noise Ratio) under aggressive voltage scaling as well as extreme process variations in. sub-70nm technologies. This is achieved by exploiting the fact that some computations are more important and contribute more to the PSNR improvement compared to the others. The computations are mapped to the hardware in such a way that only the less important computations are affected by Vdd-scaling and process variations. Simulation results show that even at a scaled voltage of 60% of nominal Vdd value, our design provides reasonable image PSNR with 69% power savings.