818 resultados para LDPC, CUDA, GPGPU, computing, GPU, DVB, S2, SDR


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bit-level systolic-array structures for computing sums of products are studied in detail. It is shown that these can be subdivided into two classes and that within each class architectures can be described in terms of a set of constraint equations. It is further demonstrated that high-performance system-level functions with attractive VLSI properties can be constructed by matching data-flow geometries in bit-level and word-level architectures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A systolic array is an array of individual processing cells each of which has some local memory and is connected only to its nearest neighbours in the form of a regular lattice. On each cycle of a simple clock every cell receives data from its neighbouring cells and performs a specific processing operation on it. The resulting data is stored within the cell and passed on to neighbouring cells on the next clock cycle. This paper gives an overview of work to date and illustrates the application of bit-level systolic arrays by means of two examples: (1) a pipelined bit-slice circuit for computing matrix x vector transforms; and (2) a bit serial structure for multi-bit convolution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A number of high-performance VLSI architectures for real-time image coding applications are described. In particular, attention is focused on circuits for computing the 2-D DCT (discrete cosine transform) and for 2-D vector quantization. The former circuits are based on Winograd algorithms and comprise a number of bit-level systolic arrays with a bit-serial, word-parallel input. The latter circuits exhibit a similar data organization and consist of a number of inner product array circuits. Both circuits are highly regular and allow extremely high data rates to be achieved through extensive use of parallelism.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most parallel computing applications in highperformance computing use the Message Passing Interface (MPI) API. Given the fundamental importance of parallel computing to science and engineering research, application correctness is paramount. MPI was originally developed around 1993 by the MPI Forum, a group of vendors, parallel programming researchers, and computational scientists. However, the document defining the standard is not issued by an official standards organization but has become a de facto standard © 2011 ACM.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work presents a novel approach for human action recognition based on the combination of computer vision techniques and common-sense knowledge and reasoning capabilities. The emphasis of this work is on how common sense has to be leveraged to a vision-based human action recognition so that nonsensical errors can be amended at the understanding stage. The proposed framework is to be deployed in a realistic environment in which humans behave rationally, that is, motivated by an aim or a reason. © 2012 Springer-Verlag.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes an optimisation of the adaptive Gaussian mixture background model that allows the deployment of the method on processors with low memory capacity. The effect of the granularity of the Gaussian mean-value and variance in an integer-based implementation is investigated and novel updating rules of the mixture weights are described. Based on the proposed framework, an implementation for a very low power consumption micro-controller is presented. Results show that the proposed method operates in real time on the micro-controller and has similar performance to the original model. © 2012 Springer-Verlag.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel admission control policy for database queries. Our methodology uses system measurements of CPU utilization and query backlogs to determine interference between queries in execution on the same database server. Query interference may arise due to the concurrent access of hardware and software resources and can affect performance in positive and negative ways. Specifically our admission control considers the mix of jobs in service and prioritizes the query classes consuming CPU resources more efficiently. The policy ignores I/O subsystems and is therefore highly appropriate for in-memory databases. We validate our approach in trace-driven simulation and show performance increases of query slowdowns and throughputs compared to first-come first-served and shortest expected processing time first scheduling. Simulation experiments are parameterized from system traces of a SAP HANA in-memory database installation with TPC-H type workloads. © 2012 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data flow techniques have been around since the early '70s when they were used in compilers for sequential languages. Shortly after their introduction they were also consideredas a possible model for parallel computing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow "macro" instructions is left to the programmer, while the compiler/run time system manages only the efficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the feasibility of the approach and the efficiency of the resulting "object" code on different classes of state-of-the-art multi-core architectures. The experimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical, parallel frameworks are also presented. © 2012 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The R-matrix method when applied to the study of intermediate energy electron scattering by the hydrogen atom gives rise to a large number of two electron integrals between numerical basis functions. Each integral is evaluated independently of the others, thereby rendering this a prime candidate for a parallel implementation. In this paper, we present a parallel implementation of this routine which uses a Graphical Processing Unit as a co-processor, giving a speedup of approximately 20 times when compared with a sequential version. We briefly consider properties of this calculation which make a GPU implementation appropriate with a view to identifying other calculations which might similarly benet.

Relevância:

20.00% 20.00%

Publicador: