Biblioteca Digital

818 resultados para LDPC, CUDA, GPGPU, computing, GPU, DVB, S2, SDR

MAPPING SYSTEM LEVEL FUNCTIONS ON TO BIT LEVEL SYSTOLIC ARRAYS.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bit-level systolic-array structures for computing sums of products are studied in detail. It is shown that these can be subdivided into two classes and that within each class architectures can be described in terms of a set of constraint equations. It is further demonstrated that high-performance system-level functions with attractive VLSI properties can be constructed by matching data-flow geometries in bit-level and word-level architectures.

BIT-LEVEL SYSTOLIC ARRAYS.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A systolic array is an array of individual processing cells each of which has some local memory and is connected only to its nearest neighbours in the form of a regular lattice. On each cycle of a simple clock every cell receives data from its neighbouring cells and performs a specific processing operation on it. The resulting data is stored within the cell and passed on to neighbouring cells on the next clock cycle. This paper gives an overview of work to date and illustrates the application of bit-level systolic arrays by means of two examples: (1) a pipelined bit-slice circuit for computing matrix x vector transforms; and (2) a bit serial structure for multi-bit convolution.

VLSI architectures for digital image coding

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A number of high-performance VLSI architectures for real-time image coding applications are described. In particular, attention is focused on circuits for computing the 2-D DCT (discrete cosine transform) and for 2-D vector quantization. The former circuits are based on Winograd algorithms and comprise a number of bit-level systolic arrays with a bit-serial, word-parallel input. The latter circuits exhibit a similar data organization and consist of a number of inner product array circuits. Both circuits are highly regular and allow extremely high data rates to be achieved through extensive use of parallelism.

Formal Analysis of MPI-based Parallel Programs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most parallel computing applications in highperformance computing use the Message Passing Interface (MPI) API. Given the fundamental importance of parallel computing to science and engineering research, application correctness is paramount. MPI was originally developed around 1993 by the MPI Forum, a group of vendors, parallel programming researchers, and computational scientists. However, the document defining the standard is not issued by an official standards organization but has become a de facto standard © 2011 ACM.

Minimizing damage in postbuckling stiffened composite panels: An optimization strategy using High Performance Computing

Relevância:

20.00% 20.00%

Publicador:

Common-sense knowledge for a computer vision system for human action recognition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work presents a novel approach for human action recognition based on the combination of computer vision techniques and common-sense knowledge and reasoning capabilities. The emphasis of this work is on how common sense has to be leveraged to a vision-based human action recognition so that nonsensical errors can be amended at the understanding stage. The proposed framework is to be deployed in a realistic environment in which humans behave rationally, that is, motivated by an aim or a reason. © 2012 Springer-Verlag.

Gaussian mixture background modelling optimisation for micro-controllers

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes an optimisation of the adaptive Gaussian mixture background model that allows the deployment of the method on processors with low memory capacity. The effect of the granularity of the Gaussian mean-value and variance in an integer-based implementation is investigated and novel updating rules of the mixture weights are described. Based on the proposed framework, an implementation for a very low power consumption micro-controller is presented. Results show that the proposed method operates in real time on the micro-controller and has similar performance to the original model. © 2012 Springer-Verlag.

WIQ:Work-intensive query scheduling for in-memory database systems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel admission control policy for database queries. Our methodology uses system measurements of CPU utilization and query backlogs to determine interference between queries in execution on the same database server. Query interference may arise due to the concurrent access of hardware and software resources and can affect performance in positive and negative ways. Specifically our admission control considers the mix of jobs in service and prioritizes the query classes consuming CPU resources more efficiently. The policy ignores I/O subsystems and is therefore highly appropriate for in-memory databases. We validate our approach in trace-driven simulation and show performance increases of query slowdowns and throughputs compared to first-come first-served and shortest expected processing time first scheduling. Simulation experiments are parameterized from system traces of a SAP HANA in-memory database installation with TPC-H type workloads. © 2012 IEEE.

Parallel patterns + Macro Data Flow for multi-core programming

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data flow techniques have been around since the early '70s when they were used in compilers for sequential languages. Shortly after their introduction they were also consideredas a possible model for parallel computing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow "macro" instructions is left to the programmer, while the compiler/run time system manages only the efficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the feasibility of the approach and the efficiency of the resulting "object" code on different classes of state-of-the-art multi-core architectures. The experimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical, parallel frameworks are also presented. © 2012 IEEE.

Enabling science through emerging HPC technologies: accelerating numerical quadrature using a GPU

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The R-matrix method when applied to the study of intermediate energy electron scattering by the hydrogen atom gives rise to a large number of two electron integrals between numerical basis functions. Each integral is evaluated independently of the others, thereby rendering this a prime candidate for a parallel implementation. In this paper, we present a parallel implementation of this routine which uses a Graphical Processing Unit as a co-processor, giving a speedup of approximately 20 times when compared with a sequential version. We briefly consider properties of this calculation which make a GPU implementation appropriate with a view to identifying other calculations which might similarly benet.

Using Machine Descriptors to Select Parallelization Models and Strategies on Hierarchical Systems:Supercomputing'2001: High Performance Computing and Networking Conference (SC)

Relevância:

20.00% 20.00%

Publicador:

Application Awareness in Adaptation Middleware: Balancing Transparency with Performance and Adaptivity:SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP), Miniworkshop on Adaptivity in Parallel and Distributed Computing through Interoperating Systems and Applications

Relevância:

20.00% 20.00%

Publicador:

Unified Scheduling of Polymorphic Parallelism on the Cell Processor:Abstracts of the 2008 SIAM Conference on Parallel Processing for Scientific Computing, Miniworkshop on the Cell Processor (SIAM PP)

Relevância:

20.00% 20.00%

Publicador:

Model-Based Hybrid MPI/OpenMP Power-Aware Computing:ACM/IEEE Supercomputing'2009: High-performance Computing, Networking, Storage and Analysis (SC): Poster Session

Relevância:

20.00% 20.00%

Publicador:

Hybrid MPI/OpenMP Power-Aware Computing

Relevância:

20.00% 20.00%

Publicador:

«
1
2
...
28
29
30
31
32
33
34
...
54
55
»