979 resultados para minimalist hardware architecture


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A BSP (Bulk Synchronous Parallelism) computation is characterized by the generation of asynchronous messages in packages during independent execution of a number of processes and their subsequent delivery at synchronization points. Bundling messages together represents a significant departure from the traditional ‘one communication at a time’ approach. In this paper the semantic consequences of communication packaging are explored. In particular, the BSP communication structure is identified with a general form of substitution—predicate substitution. Predicate substitution provides a means of reasoning about the synchronized delivery of asynchronous communications when the immediate programming context does not explicitly refer to the variables that are to be updated (unlike traditional operations, such as the assignment $x := e$, where the names of the updated variables can be extracted from the context). Proofs of implementations of Newton's root finding method and prefix sum are used to illustrate the practical application of the proposed approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the advent of new video standards such as MPEG-4 part-10 and H.264/H.26L, demands for advanced video coding, particularly in the area of variable block size video motion estimation (VBSME), are increasing. In this paper, we propose a new one-dimensional (1-D) very large-scale integration architecture for full-search VBSME (FSVBSME). The VBS sum of absolute differences (SAD) computation is performed by re-using the results of smaller sub-block computations. These are distributed and combined by incorporating a shuffling mechanism within each processing element. Whereas a conventional 1-D architecture can process only one motion vector (MV), this new architecture can process up to 41 MV sub-blocks (within a macroblock) in the same number of clock cycles.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel application-specific instruction set processor (ASIP) for use in the construction of modern signal processing systems is presented. This is a flexible device that can be used in the construction of array processor systems for the real-time implementation of functions such as singular-value decomposition (SVD) and QR decomposition (QRD), as well as other important matrix computations. It uses a coordinate rotation digital computer (CORDIC) module to perform arithmetic operations and several approaches are adopted to achieve high performance including pipelining of the micro-rotations, the use of parallel instructions and a dual-bus architecture. In addition, a novel method for scale factor correction is presented which only needs to be applied once at the end of the computation. This also reduces computation time and enhances performance. Methods are described which allow this processor to be used in reduced dimension (i.e., folded) array processor structures that allow tradeoffs between hardware and performance. The net result is a flexible matrix computational processing element (PE) whose functionality can be changed under program control for use in a wider range of scenarios than previous work. Details are presented of the results of a design study, which considers the application of this decomposition PE architecture in a combined SVD/QRD system and demonstrates that a combination of high performance and efficient silicon implementation are achievable. © 2005 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper, chosen as a best paper from the 2005 SAMOS Workshop on Computer Systems: describes the for the first time the major Abhainn project for automated system level design of embedded signal processing systems. In particular, this describes four key novelties: novel algorithm modelling techniques for DSP systems, automated implementation realisation, algorithm transformation for system optimisation and automated inter-processor communication. This is applied to two complex systems: a radar and sonar system. In both cases technology which allows non-experts to automatically create low-overhead, high performance embedded signal processing systems is exhibited.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An application specific programmable processor (ASIP) suitable for the real-time implementation of matrix computations such as Singular Value and QR Decomposition is presented. The processor incorporates facilities for the issue of parallel instructions and a dual-bus architecture that are designed to achieve high performance. Internally, it uses a CORDIC module to perform arithmetic operations, with pipelining of the internal recursive loop exploited to multiplex the two independent micro-rotations onto a single piece of hardware. The net result is a flexible processing element whose functionality can be changed under program control, which combines high performance with efficient silicon implementation. This is illustrated through the results of a detailed silicon design study and the applications of the techniques to a combined SVD/QRD system.