Biblioteca Digital

6 resultados para Routines

em Indian Institute of Science - Bangalore - Índia

Lenient Execution and Concurrent Execution of Re-entrant Routines: Efficient Implementation in Data Flow Systems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Implementation details of efficient schemes for lenient execution and concurrent execution of re-entrant routines in a data flow model have been discussed in this paper. The proposed schemes require no extra hardware support and utilise the existing hardware resources such as the Matching Unit and Memory Network Interface, effectively to achieve the above mentioned goals.

Veja mais

Design and evaluation of a dual-microcomputer shared memory system with a shared I/O bus

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The design, implementation and evaluation are described of a dual-microcomputer system based on the concept of shared memory. Shared memory is useful for passing large blocks of data and it also provides a means to hold and work with shared data. In addition to the shared memory, a separate bus between the I/O ports of the microcomputers is provided. This bus is utilized for interprocessor synchronization. Software routines helpful in applying the dual-microcomputer system to realistic problems are presented. Performance evaluation of the system is carried out using benchmarks.

Veja mais

Design and Performance Evaluation of EXMAN: An EXtended MANchester Data Flow Computer

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data flow computers are high-speed machines in which an instruction is executed as soon as all its operands are available. This paper describes the EXtended MANchester (EXMAN) data flow computer which incorporates three major extensions to the basic Manchester machine. As extensions we provide a multiple matching units scheme, an efficient, implementation of array data structure, and a facility to concurrently execute reentrant routines. A simulator for the EXMAN computer has been coded in the discrete event simulation language, SIMULA 67, on the DEC 1090 system. Performance analysis studies have been conducted on the simulated EXMAN computer to study the effectiveness of the proposed extensions. The performance experiments have been carried out using three sample problems: matrix multiplication, Bresenham's line drawing algorithm, and the polygon scan-conversion algorithm.

Veja mais

Optimization of two-dimensional NMR by matched accumulation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is well known that in the time-domain acquisition of NMR data, signal-to-noise (S/N) improves as the square root of the number of transients accumulated. However, the amplitude of the measured signal varies during the time of detection, having a functional form dependent on the coherence detected. Matching the time spent signal averaging to the expected amplitude of the signal observed should also improve the detected signal-to-noise. Following this reasoning, Barna et al. (J Magn. Reson.75, 384, 1987) demonstrated the utility of exponential sampling in one- and two-dimensional NMR, using maximum-entropy methods to analyze the data. It is proposed here that for two-dimensional experiments the exponential sampling be replaced by exponential averaging. The data thus collected can be analyzed by standard fast-Fourier-transform routines. We demonstrate the utility of exponential averaging in 2D NOESY spectra of the protein ubiquitin, in which an enhanced SIN is observed. It is also shown that the method acquires delayed double-quantum-filtered COSY without phase distortion.

Veja mais

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes.

Veja mais

PolyMage: Automatic Optimization for Image Processing Pipelines

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each stage typically performs one of point-wise, stencil, reduction or data-dependent operations on image pixels. Individual stages in a pipeline typically exhibit abundant data parallelism that can be exploited with relative ease. However, the stages also require high memory bandwidth preventing effective utilization of parallelism available on modern architectures. For applications that demand high performance, the traditional options are to use optimized libraries like OpenCV or to optimize manually. While using libraries precludes optimization across library routines, manual optimization accounting for both parallelism and locality is very tedious. The focus of our system, PolyMage, is on automatically generating high-performance implementations of image processing pipelines expressed in a high-level declarative language. Our optimization approach primarily relies on the transformation and code generation capabilities of the polyhedral compiler framework. To the best of our knowledge, this is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization automatically. Experimental results on a modern multicore system show that the performance achieved by our automatic approach is up to 1.81x better than that achieved through manual tuning in Halide, a state-of-the-art language and compiler for image processing pipelines. For a camera raw image processing pipeline, our performance is comparable to that of a hand-tuned implementation.

Veja mais

6 resultados para Routines

em Indian Institute of Science - Bangalore - Índia

Filtro por publicador

Lenient Execution and Concurrent Execution of Re-entrant Routines: Efficient Implementation in Data Flow Systems

Design and evaluation of a dual-microcomputer shared memory system with a shared I/O bus

Design and Performance Evaluation of EXMAN: An EXtended MANchester Data Flow Computer

Optimization of two-dimensional NMR by matched accumulation

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

PolyMage: Automatic Optimization for Image Processing Pipelines