Biblioteca Digital

We propose a data flow based run time system as an efficient tool for supporting execution of parallel code on heterogeneous architectures hosting both multicore CPUs and GPUs. We discuss how the proposed run time system may be the target of both structured parallel applications developed using algorithmic skeletons/parallel design patterns and also more "domain specific" programming models. Experimental results demonstrating the feasibility of the approach are presented. © 2012 World Scientific Publishing Company.

Veja mais

Rapid Implementation and Optimisation of DSP Systems on SoPC Heterogeneous Platforms

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The emergence of programmable logic devices as processing platforms for digital signal processing applications poses challenges concerning rapid implementation and high level optimization of algorithms on these platforms. This paper describes Abhainn, a rapid implementation methodology and toolsuite for translating an algorithmic expression of the system to a working implementation on a heterogeneous multiprocessor/field programmable gate array platform, or a standalone system on programmable chip solution. Two particular focuses for Abhainn are the automated but configurable realisation of inter-processor communuication fabrics, and the establishment of novel dedicated hardware component design methodologies allowing algorithm level transformation for system optimization. This paper outlines the approaches employed in both these particular instances.

Veja mais

New algorithms and VLSI architectures for SRT division and square root

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In real time digital signal processing, high performance modules for division and square root are essential if many powerful algorithms are to be implemented. In this paper, a new radix 2 algorithms for SRT division and square root are developed. For these new schemes, the result digits and the residuals are computed concurrently and the computations in adjacent rows are overlapped. Consequently, their performance should exceed that of the radix 2 SRT methods. VLSI array architectures to implement the new division and square root schemes are also presented.

Veja mais

Optimized bit level architectures for IIR filtering

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Optimized circuits for implementing high-performance bit-parallel IIR filters are presented. Circuits constructed mainly from simple carry save adders and based on most-significant-bit (MSB) first arithmetic are described. Two methods resulting in systems which are 100% efficient in that they are capable of sampling data every cycle are presented. In the first approach the basic circuit is modified so that the level of pipelining used is compatible with the small, but fixed, latency associated with the computation in question. This is achieved through insertion of pipeline delays (half latches) on every second row of cells. This produces an area-efficient solution in which the throughput rate is determined by a critical path of 76 gate delays. A second approach combines the MSB first arithmetic methods with the scattered look-ahead methods. Important design issues are addressed, including wordlength truncation, overflow detection, and saturation.

Veja mais

Evaluation of Streaming Aggregation on Parallel Hardware Architectures

Relevância:

30.00% 30.00%

Publicador:

Veja mais

Formic: Cost-Efficient and Scalable Prototyping of Manycore Architectures

Relevância:

30.00% 30.00%

Publicador:

Veja mais

On the performance of LDPC and turbo decoder architectures with unreliable memories

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we investigate the impact of faulty memory bit-cells on the performance of LDPC and Turbo channel decoders based on realistic memory failure models. Our study investigates the inherent error resilience of such codes to potential memory faults affecting the decoding process. We develop two mitigation mechanisms that reduce the impact of memory faults rather than correcting every single error. We show how protection of only few bit-cells is sufficient to deal with high defect rates. In addition, we show how the use of repair-iterations specifically helps mitigating the impact of faults that occur inside the decoder itself.

Veja mais