46 resultados para Parallel computing

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Most parallel computing applications in highperformance computing use the Message Passing Interface (MPI) API. Given the fundamental importance of parallel computing to science and engineering research, application correctness is paramount. MPI was originally developed around 1993 by the MPI Forum, a group of vendors, parallel programming researchers, and computational scientists. However, the document defining the standard is not issued by an official standards organization but has become a de facto standard © 2011 ACM.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Data flow techniques have been around since the early '70s when they were used in compilers for sequential languages. Shortly after their introduction they were also consideredas a possible model for parallel computing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow "macro" instructions is left to the programmer, while the compiler/run time system manages only the efficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the feasibility of the approach and the efficiency of the resulting "object" code on different classes of state-of-the-art multi-core architectures. The experimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical, parallel frameworks are also presented. © 2012 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The utilization of the computational Grid processor network has become a common method for researchers and scientists without access to local processor clusters to avail of the benefits of parallel processing for compute-intensive applications. As a result, this demand requires effective and efficient dynamic allocation of available resources. Although static scheduling and allocation techniques have proved effective, the dynamic nature of the Grid requires innovative techniques for reacting to change and maintaining stability for users. The dynamic scheduling process requires quite powerful optimization techniques, which can themselves lack the performance required in reaction time for achieving an effective schedule solution. Often there is a trade-off between solution quality and speed in achieving a solution. This paper presents an extension of a technique used in optimization and scheduling which can provide the means of achieving this balance and improves on similar approaches currently published.

Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Traditional static analysis fails to auto-parallelize programs with a complex control and data flow. Furthermore, thread-level parallelism in such programs is often restricted to pipeline parallelism, which can be hard to discover by a programmer. In this paper we propose a tool that, based on profiling information, helps the programmer to discover parallelism. The programmer hand-picks the code transformations from among the proposed candidates which are then applied by automatic code transformation techniques.

This paper contributes to the literature by presenting a profiling tool for discovering thread-level parallelism. We track dependencies at the whole-data structure level rather than at the element level or byte level in order to limit the profiling overhead. We perform a thorough analysis of the needs and costs of this technique. Furthermore, we present and validate the belief that programs with complex control and data flow contain significant amounts of exploitable coarse-grain pipeline parallelism in the program’s outer loops. This observation validates our approach to whole-data structure dependencies. As state-of-the-art compilers focus on loops iterating over data structure members, this observation also explains why our approach finds coarse-grain pipeline parallelism in cases that have remained out of reach for state-of-the-art compilers. In cases where traditional compilation techniques do find parallelism, our approach allows to discover higher degrees of parallelism, allowing a 40% speedup over traditional compilation techniques. Moreover, we demonstrate real speedups on multiple hardware platforms.