18 resultados para Parallelizing Compilers

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Parallelizing compilers have difficulty analysing and optimising complex code. To address this, some analysis may be delayed until run-time, and techniques such as speculative execution used. Furthermore, to enhance performance, a feedback loop may be setup between the compile time and run-time analysis systems, as in iterative compilation. To extend this, it is proposed that the run-time analysis collects information about the values of variables not already determined, and estimates a probability measure for the sampled values. These measures may be used to guide optimisations in further analyses of the program. To address the problem of variables with measures as values, this paper also presents an outline of a novel combination of previous probabilistic denotational semantics models, applied to a simple imperative language.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The efficient development of multi-threaded software has, for many years, been an unsolved problem in computer science. Finding a solution to this problem has become urgent with the advent of multi-core processors. Furthermore, the problem has become more complicated because multi-cores are everywhere (desktop, laptop, embedded system). As such, they execute generic programs which exhibit very different characteristics than the scientific applications that have been the focus of parallel computing in the past.
Implicitly parallel programming is an approach to parallel pro- gramming that promises high productivity and efficiency and rules out synchronization errors and race conditions by design. There are two main ingredients to implicitly parallel programming: (i) a con- ventional sequential programming language that is extended with annotations that describe the semantics of the program and (ii) an automatic parallelizing compiler that uses the annotations to in- crease the degree of parallelization.
It is extremely important that the annotations and the automatic parallelizing compiler are designed with the target application do- main in mind. In this paper, we discuss the Paralax approach to im- plicitly parallel programming and we review how the annotations and the compiler design help to successfully parallelize generic programs. We evaluate Paralax on SPECint benchmarks, which are a model for such programs, and demonstrate scalable speedups, up to a factor of 6 on 8 cores.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the emergence of multicore and manycore processors, engineers must design and develop software in drastically new ways to benefit from the computational power of all cores. However, developing parallel software is much harder than sequential software because parallelism can't be abstracted away easily. Authors Hans Vandierendonck and Tom Mens provide an overview of technologies and tools to support developers in this complex and error-prone task. © 2012 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A FORTRAN 90 program is presented which calculates the total cross sections, and the electron energy spectra of the singly and doubly differential cross sections for the single target ionization of neutral atoms ranging from hydrogen up to and including argon. The code is applicable for the case of both high and low Z projectile impact in fast ion-atom collisions. The theoretical models provided for the program user are based on two quantum mechanical approximations which have proved to be very successful in the study of ionization in ion-atom collisions. These are the continuum-distorted-wave (CDW) and continuum-distorted-wave eikonal-initial-state (CDW-EIS) approximations. The codes presented here extend previously published. codes for single ionization of. target hydrogen [Crothers and McCartney, Comput. Phys. Commun. 72 (1992) 288], target helium [Nesbitt, O'Rourke and Crothers, Comput. Phys. Commun. 114 (1998) 385] and target atoms ranging from lithium to neon [O'Rourke, McSherry and Crothers, Comput. Phys. Commun. 131 (2000) 129]. Cross sections for all of these target atoms may be obtained as limiting cases from the present code. Title of program: ARGON Catalogue identifier: ADSE Program summary URL: http://cpc.cs.qub.ac.uk/cpc/summaries/ADSE Program obtainable from: CPC Program Library Queen's University of Belfast, N. Ireland Licensing provisions: none Computer for which the program is designed and others on which it is operable: Computers: Four by 200 MHz Pro Pentium Linux server, DEC Alpha 21164; Four by 400 MHz Pentium 2 Xeon 450 Linux server, IBM SP2 and SUN Enterprise 3500 Installations: Queen's University, Belfast Operating systems under which the program has been tested: Red-hat Linux 5.2, Digital UNIX Version 4.0d, AIX, Solaris SunOS 5.7 Compilers: PGI workstations, DEC CAMPUS Programming language used: FORTRAN 90 with MPI directives No. of bits in a word: 64, except on Linux servers 32 Number of processors used: any number Has the code been vectorized or parallelized? Parallelized using MPI No. of bytes in distributed program, including test data, etc.: 32 189 Distribution format: tar gzip file Keywords: Single ionization, cross sections, continuum-distorted-wave model, continuum- distorted-wave eikonal-initial-state model, target atoms, wave treatment Nature of physical problem: The code calculates total, and differential cross sections for the single ionization of target atoms ranging from hydrogen up to and including argon by both light and heavy ion impact. Method of solution: ARGON allows the user to calculate the cross sections using either the CDW or CDW-EIS [J. Phys. B 16 (1983) 3229] models within the wave treatment. Restrictions on the complexity of the program: Both the CDW and CDW-EIS models are two-state perturbative approximations. Typical running time: Times vary according to input data and number of processors. For one processor the test input data for double differential cross sections (40 points) took less than one second, whereas the test input for total cross sections (20 points) took 32 minutes. Unusual features of the program: none (C) 2003 Elsevier B.V All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditional static analysis fails to auto-parallelize programs with a complex control and data flow. Furthermore, thread-level parallelism in such programs is often restricted to pipeline parallelism, which can be hard to discover by a programmer. In this paper we propose a tool that, based on profiling information, helps the programmer to discover parallelism. The programmer hand-picks the code transformations from among the proposed candidates which are then applied by automatic code transformation techniques.

This paper contributes to the literature by presenting a profiling tool for discovering thread-level parallelism. We track dependencies at the whole-data structure level rather than at the element level or byte level in order to limit the profiling overhead. We perform a thorough analysis of the needs and costs of this technique. Furthermore, we present and validate the belief that programs with complex control and data flow contain significant amounts of exploitable coarse-grain pipeline parallelism in the program’s outer loops. This observation validates our approach to whole-data structure dependencies. As state-of-the-art compilers focus on loops iterating over data structure members, this observation also explains why our approach finds coarse-grain pipeline parallelism in cases that have remained out of reach for state-of-the-art compilers. In cases where traditional compilation techniques do find parallelism, our approach allows to discover higher degrees of parallelism, allowing a 40% speedup over traditional compilation techniques. Moreover, we demonstrate real speedups on multiple hardware platforms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Cell Broadband Engine (BE) Architecture is a new heterogeneous multi-core architecture targeted at compute-intensive workloads. The architecture of the Cell BE has several features that are unique in high-performance general-purpose processors, most notably the extensive support for vectorization, scratch pad memories and explicit programming of direct memory accesses (DMAs) and mailbox communication. While these features strongly increase programming complexity, it is generally claimed that significant speedups can be obtained by using Cell BE processors. This paper presents our experiences with using the Cell BE architecture to accelerate Clustal W, a bio-informatics program for multiple sequence alignment. We report on how we apply the unique features of the Cell BE to Clustal W and how important each is in obtaining high performance. By making extensive use of vectorization and by parallelizing the application across all cores, we demonstrate a speedup of 24.4 times when using 16 synergistic processor units on a QS21 Cell Blade compared to single-thread execution on the power processing unit. As the Cell BE exploits a large number of slim cores, our highly optimized implementation is just 3.8 times faster than a 3-thread version running on an Intel Core2 Duo, as the latter processor exploits a small number of fat cores.

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data flow techniques have been around since the early '70s when they were used in compilers for sequential languages. Shortly after their introduction they were also consideredas a possible model for parallel computing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow "macro" instructions is left to the programmer, while the compiler/run time system manages only the efficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the feasibility of the approach and the efficiency of the resulting "object" code on different classes of state-of-the-art multi-core architectures. The experimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical, parallel frameworks are also presented. © 2012 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

On multiprocessors with explicitly managed memory hierarchies (EMM), software has the responsibility of moving data in and out of fast local memories. This task can be complex and error-prone even for expert programmers. Before we can allow compilers to handle the complexity for us, we must identify the abstractions that are general enough to allow us to write applications with reasonable effort, yet speci?c enough to exploit the vast on-chip memory bandwidth of EMM multi-processors. To this end, we compare two programming models against hand-tuned codes on the STI Cell, paying attention to programmability and performance. The ?rst programming model, Sequoia, abstracts the memory hierarchy as private address spaces, each corresponding to a parallel task. The second, Cellgen, is a new framework which provides OpenMP-like semantics and the abstraction of a shared address spaces divided into private and shared data. We compare three applications programmed using these models against their hand-optimized counterparts in terms of abstractions, programming complexity, and performance.