999 resultados para VLSI implementation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose the design and implementation of hardware architecture for spatial prediction based image compression scheme, which consists of prediction phase and quantization phase. In prediction phase, the hierarchical tree structure obtained from the test image is used to predict every central pixel of an image by its four neighboring pixels. The prediction scheme generates an error image, to which the wavelet/sub-band coding algorithm can be applied to obtain efficient compression. The software model is tested for its performance in terms of entropy, standard deviation. The memory and silicon area constraints play a vital role in the realization of the hardware for hand-held devices. The hardware architecture is constructed for the proposed scheme, which involves the aspects of parallelism in instructions and data. The processor consists of pipelined functional units to obtain the maximum throughput and higher speed of operation. The hardware model is analyzed for performance in terms throughput, speed and power. The results of hardware model indicate that the proposed architecture is suitable for power constrained implementations with higher data rate

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The K-best detector is considered as a promising technique in the MIMO-OFDM detection because of its good performance and low complexity. In this paper, a new K-best VLSI architecture is presented. In the proposed architecture, the metric computation units (MCUs) expand each surviving path only to its partial branches, based on the novel expansion scheme, which can predetermine the branches' ascending order by their local distances. Then a distributed sorter sorts out the new K surviving paths from the expanded branches in pipelines. Compared to the conventional K-best scheme, the proposed architecture can approximately reduce fundamental operations by 50% and 75% for the 16-QAM and the 64-QAM cases, respectively, and, consequently, lower the demand on the hardware resource significantly. Simulation results prove that the proposed architecture can achieve a performance very similar to conventional K-best detectors. Hence, it is an efficient solution to the K-best detector's VLSI implementation for high-throughput MIMO-OFDM systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Multiprocessor systems which afford a high degree of parallelism are used in a variety of applications. The extremely stringent reliability requirement has made the provision of fault-tolerance an important aspect in the design of such systems. This paper presents a review of the various approaches towards tolerating hardware faults in multiprocessor systems. It. emphasizes the basic concepts of fault tolerant design and the various problems to be taken care of by the designer. An indepth survey of the various models, techniques and methods for fault diagnosis is given. Further, we consider the strategies for fault-tolerance in specialized multiprocessor architectures which have the ability of dynamic reconfiguration and are suited to VLSI implementation. An analysis of the state-óf-the-art is given which points out the major aspects of fault-tolerance in such architectures.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

High-speed evaluation of a large number of linear, quadratic, and cubic expressions is very important for the modeling and real-time display of objects in computer graphics. Using VLSI techniques, chips called pixel planes have actually been built by H. Fuchs and his group to evaluate linear expressions. In this paper, we describe a topological variant of Fuchs' pixel planes which can evaluate linear, quadratic, cubic, and higher-order polynomials. In our design, we make use of local interconnections only, i.e., interconnections between neighboring processing cells. This leads to the concept of tiling the processing cells for VLSI implementation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the advent of VLSI it has become possible to map parallel algorithms for compute-bound problems directly on silicon. Systolic architecture is very good candidate for VLSI implementation because of its regular and simple design, and regular communication pattern. In this paper, a systolic algorithm and corresponding systolic architecture, a linear systolic array, for the scanline-based hidden surface removal problem in three-dimensional computer graphics have been proposed. The algorithm is based on the concept of sample spans or intervals. The worst case time taken by the algorithm is O(n), n being the number of segments in a scanline. The time taken by the algorithm for a given scene depends on the scene itself, and on an average considerable improvement over the worst case behaviour is expected. A pipeline scheme for handling the I/O process has also been proposed which is suitable for VLSI implementation of the algorithm.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper addresses the problem of efficiently computing the motor torques required to drive a lower-pair kinematic chain (e.g., a typical manipulator arm in free motion, or a mechanical leg in the swing phase) given the desired trajectory; i.e., the Inverse Dynamics problem. It investigates the high degree of parallelism inherent in the computations, and presents two "mathematically exact" formulations especially suited to high-speed, highly parallel implementations using special-purpose hardware or VLSI devices. In principle, the formulations should permit the calculations to run at a speed bounded only by I/O. The first presented is a parallel version of the recent linear Newton-Euler recursive algorithm. The time cost is also linear in the number of joints, but the real-time coefficients are reduced by almost two orders of magnitude. The second formulation reports a new parallel algorithm which shows that it is possible to improve upon the linear time dependency. The real time required to perform the calculations increases only as the [log2] of the number of joints. Either formulation is susceptible to a systolic pipelined architecture in which complete sets of joint torques emerge at successive intervals of four floating-point operations. Hardware requirements necessary to support the algorithm are considered and found not to be excessive, and a VLSI implementation architecture is suggested. We indicate possible applications to incorporating dynamical considerations into trajectory planning, e.g. it may be possible to build an on-line trajectory optimizer.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Real time digital signal processing requires the development of high performance arithmetic algorithms suitable for VLSI design. In this paper, a new online, circular coordinate system CORDIC algorithm is described, which has a constant scale factor. This algorithm was developed using a new Angular Representation (AR) model A radix 2 version of the CORDIC algorithm is presented, along with an architecture suitable for VLSI implementation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we propose a novel finite impulse response (FIR) filter design methodology that reduces the number of operations with a motivation to reduce power consumption and enhance performance. The novelty of our approach lies in the generation of filter coefficients such that they conform to a given low-power architecture, while meeting the given filter specifications. The proposed algorithm is formulated as a mixed integer linear programming problem that minimizes chebychev error and synthesizes coefficients which consist of pre-specified alphabets. The new modified coefficients can be used for low-power VLSI implementation of vector scaling operations such as FIR filtering using computation sharing multiplier (CSHM). Simulations in 0.25um technology show that CSHM FIR filter architecture can result in 55% power and 34% speed improvement compared to carry save multiplier (CSAM) based filters.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The design and VLSI implementation of two key components of the class-IV partial response maximum likelihood channel (PR-IV) the adaptive filter and the Viterbi decoder are described. These blocks are implemented using parameterised VHDL modules, from a library of common digital signal processing (DSP) and arithmetic functions. Design studies, based on 0.6 micron 3.3V standard cell processes, indicate that worst case sampling rates of 49 mega-samples per second are achievable for this system, with proportionally high sampling rates for full custom designs and smaller dimension processes. Significant increases in the sampling rate, from 49 MHz to approximately 180 MHz, can be achieved by operating four filter modules in parallel, and this implementation has 50% lower power consumption than a pipelined filter operating at the same speed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, the performance and convergence time comparisons of various low-complexity LMS algorithms used for the coefficient update of adaptive I/Q corrector for quadrature receivers are presented. We choose the optimum LMS algorithm suitable for low complexity, high performance and high order QAM and PSK constellations. What is more, influence of the finite bit precision on VLSI implementation of such algorithms is explored through extensive simulations and optimum wordlengths established.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Decimal multiplication is an integral part offinancial, commercial, and internet-based computations. The basic building block of a decimal multiplier is a single digit multiplier. It accepts two Binary Coded Decimal (BCD) inputs and gives a product in the range [0, 81] represented by two BCD digits. A novel design for single digit decimal multiplication that reduces the critical path delay and area is proposed in this research. Out of the possible 256 combinations for the 8-bit input, only hundred combinations are valid BCD inputs. In the hundred valid combinations only four combinations require 4 x 4 multiplication, combinations need x multiplication, and the remaining combinations use either x or x 3 multiplication. The proposed design makes use of this property. This design leads to more regular VLSI implementation, and does not require special registers for storing easy multiples. This is a fully parallel multiplier utilizing only combinational logic, and is extended to a Hex/Decimal multiplier that gives either a decimal output or a binary output. The accumulation ofpartial products generated using single digit multipliers is done by an array of multi-operand BCD adders for an (n-digit x n-digit) multiplication.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a performance analysis of reversible, fault tolerant VLSI implementations of carry select and hybrid decimal adders suitable for multi-digit BCD addition. The designs enable partial parallel processing of all digits that perform high-speed addition in decimal domain. When the number of digits is more than 25 the hybrid decimal adder can operate 5 times faster than conventional decimal adder using classical logic gates. The speed up factor of hybrid adder increases above 10 when the number of decimal digits is more than 25 for reversible logic implementation. Such highspeed decimal adders find applications in real time processors and internet-based applications. The implementations use only reversible conservative Fredkin gates, which make it suitable for VLSI circuits.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Clenshaw’s recurrenee formula is used to derive recursive algorithms for the discrete cosine transform @CT) and the inverse discrete cosine transform (IDCT). The recursive DCT algorithm presented here requires one fewer delay element per coefficient and one fewer multiply operation per coeflident compared with two recently proposed methods. Clenshaw’s recurrence formula provides a unified development for the recursive DCT and IDCT algorithms. The M v e al gorithms apply to arbitrary lengtb algorithms and are appropriate for VLSI implementation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coupled map lattices (CML) can describe many relaxation and optimization algorithms currently used in image processing. We recently introduced the ‘‘plastic‐CML’’ as a paradigm to extract (segment) objects in an image. Here, the image is applied by a set of forces to a metal sheet which is allowed to undergo plastic deformation parallel to the applied forces. In this paper we present an analysis of our ‘‘plastic‐CML’’ in one and two dimensions, deriving the nature and stability of its stationary solutions. We also detail how to use the CML in image processing, how to set the system parameters and present examples of it at work. We conclude that the plastic‐CML is able to segment images with large amounts of noise and large dynamic range of pixel values, and is suitable for a very large scale integration(VLSI) implementation.