Biblioteca Digital

131 resultados para 291605 Processor Architectures

Application domains for fixed-length block structured architectures

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Targeting heterogeneous architectures via macro data flow

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a data flow based run time system as an efficient tool for supporting execution of parallel code on heterogeneous architectures hosting both multicore CPUs and GPUs. We discuss how the proposed run time system may be the target of both structured parallel applications developed using algorithmic skeletons/parallel design patterns and also more "domain specific" programming models. Experimental results demonstrating the feasibility of the approach are presented. © 2012 World Scientific Publishing Company.

Veja mais

Rapid Implementation and Optimisation of DSP Systems on SoPC Heterogeneous Platforms

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The emergence of programmable logic devices as processing platforms for digital signal processing applications poses challenges concerning rapid implementation and high level optimization of algorithms on these platforms. This paper describes Abhainn, a rapid implementation methodology and toolsuite for translating an algorithmic expression of the system to a working implementation on a heterogeneous multiprocessor/field programmable gate array platform, or a standalone system on programmable chip solution. Two particular focuses for Abhainn are the automated but configurable realisation of inter-processor communuication fabrics, and the establishment of novel dedicated hardware component design methodologies allowing algorithm level transformation for system optimization. This paper outlines the approaches employed in both these particular instances.

Veja mais

FPGA based Soft-core SIMD Processing: A MIMO-OFDM Fixed-Complexity Sphere Decoder Case Study

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To enable reliable data transfer in next generation Multiple-Input Multiple-Output (MIMO) communication systems, terminals must be able to react to fluctuating channel conditions by having flexible modulation schemes and antenna configurations. This creates a challenging real-time implementation problem: to provide the high performance required of cutting edge MIMO standards, such as 802.11n, with the flexibility for this behavioural variability. FPGA softcore processors offer a solution to this problem, and in this paper we show how heterogeneous SISD/SIMD/MIMD architectures can enable programmable multicore architectures on FPGA with similar performance and cost as traditional dedicated circuit-based architectures. When applied to a 4×4 16-QAM Fixed-Complexity Sphere Decoder (FSD) detector we present the first soft-processor based solution for real-time 802.11n MIMO.

Veja mais

Embodied Sound: Aural Architectures and the Body

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Automated design of DSP array processor chips

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Details are presented of the DAC (DSP ASIC Compiler) silicon compiler framework. DAC allows a non-specialist to automatically design DSP ASICs and DSP ASIC cores directly form a high level specification. Typical designs take only several minutes and the resulting layouts are comparable in area and performance to handcrafted designs.

Veja mais

Design and analysis of matching circuit architectures for a closest match lookup

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper investigates the implementation of a number of circuits used to perform a high speed closest value match lookup. The design is targeted particularly for use in a search trie, as used in various networking lookup applications, but can be applied to many other areas where such a match is required. A range of different designs have been considered and implemented on FPGA. A detailed description of the architectures investigated is followed by an analysis of the synthesis results. © 2006 IEEE.

Veja mais

Algorithms and architectures for high performance recursive filtering

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, a number of most significant digit (msd) first bit parallel multipliers for recursive filtering have been reported. However, the design approach which has been used has, in general, been heuristic and consequently, optimality has not always been assured. In this paper, msd first multiply accumulate algorithms are described and important relationships governing the dependencies between latency, number representations, etc are derived. A more systematic approach to designing recursive filters is illustrated by applying the algorithms and associated relationships to the design of cascadable modules for high sample rate IIR filtering and wave digital filtering.

Veja mais

VLSI architectures for vector quantization

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The real time implementation of an efficient signal compression technique, Vector Quantization (VQ), is of great importance to many digital signal coding applications. In this paper, we describe a new family of bit level systolic VLSI architectures which offer an attractive solution to this problem. These architectures are based on a bit serial, word parallel approach and high performance and efficiency can be achieved for VQ applications of a wide range of bandwidths. Compared with their bit parallel counterparts, these bit serial circuits provide better alternatives for VQ implementations in terms of performance and cost. © 1995 Kluwer Academic Publishers.

Veja mais

VLSI architectures for digital image coding

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A number of high-performance VLSI architectures for real-time image coding applications are described. In particular, attention is focused on circuits for computing the 2-D DCT (discrete cosine transform) and for 2-D vector quantization. The former circuits are based on Winograd algorithms and comprise a number of bit-level systolic arrays with a bit-serial, word-parallel input. The latter circuits exhibit a similar data organization and consist of a number of inner product array circuits. Both circuits are highly regular and allow extremely high data rates to be achieved through extensive use of parallelism.

Veja mais

Systolic array architectures for parameterised multiplexed IIR filters

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Whilst conventional bit level pipelining introduces an m cycle delay, it does allow m separate computations to be processed at throughput rates comparable to that using word level systolic arrays. We concentrate on exploiting this delay and describe a systematic method for the design of high performance multiplexed IIR filters. Two multiply and accumulate structures are identified based on shift-and-add and carry-save data organisations which can be used as building blocks in the design of IIR filters. By replacing the word level multiply and accumulate units in word level systolic structures with their equivalent bit level circuits and introducing latches to ensure correct timing, numerous architectures can be designed that process multiplexed data directly without any additional circuit overhead.

Veja mais

Bit-Level systolic architectures for high performance IIR filtering

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Several novel systolic architectures for implementing densely pipelined bit parallel IIR filter sections are presented. The fundamental problem of latency in the feedback loop is overcome by employing redundant arithmetic in combination with bit-level feedback, allowing a basic first-order section to achieve a wordlength-independent latency of only two clock cycles. This is extended to produce a building block from which higher order sections can be constructed. The architecture is then refined by combining the use of both conventional and redundant arithmetic, resulting in two new structures offering substantial hardware savings over the original design. In contrast to alternative techniques, bit-level pipelinability is achieved with no net cost in hardware. © 1989 Kluwer Academic Publishers.

Veja mais

VLSI processor for high-performance arithmetic computations

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A high performance VLSI architecture to perform combined multiply-accumulate, divide, and square root operations is proposed. The circuit is highly regular, requires only minimal control, and can be pipelined right down to the bit level. The system can also be reconfigured on every cycle to perform one or more of these operations. The throughput rate for each operation is the same and is wordlength independent. This is achieved using redundant arithmetic. With current CMOS technology, throughput rates in excess of 80 million operations per second are expected.

Veja mais

Hardware elliptic curve cryptographic processor over GF(p)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel hardware architecture for elliptic curve cryptography (ECC) over GF(p) is introduced. This can perform the main prime field arithmetic functions needed in these cryptosystems including modular inversion and multiplication. This is based on a new unified modular inversion algorithm that offers considerable improvement over previous ECC techniques that use Fermat's Little Theorem for this operation. The processor described uses a full-word multiplier which requires much fewer clock cycles than previous methods, while still maintaining a competitive critical path delay. The benefits of the approach have been demonstrated by utilizing these techniques to create a field-programmable gate array (FPGA) design. This can perform a 256-bit prime field scalar point multiplication in 3.86 ms, the fastest FPGA time reported to date. The ECC architecture described can also perform four different types of modular inversion, making it suitable for use in many different ECC applications. © 2006 IEEE.

Veja mais

Design and implementation of the symmetrically extended 2-D wavelet transform

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The inclusion of the Discrete Wavelet Transform in the JPEG-2000 standard has added impetus to the research of hardware architectures for the two-dimensional wavelet transform. In this paper, a VLSI architecture for performing the symmetrically extended two-dimensional transform is presented. This architecture conforms to the JPEG-2000 standard and is capable of near-optimal performance when dealing with the image boundaries. The architecture also achieves efficient processor utilization. Implementation results based on a Xilinx Virtex-2 FPGA device are included.

Veja mais

131 resultados para 291605 Processor Architectures

Filtro por publicador