9 resultados para Gzip OpenMP

em Greenwich Academic Literature Archive - UK


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work we show how automatic relative debugging can be used to find differences in computation between a correct serial program and an OpenMP parallel version of that program that does not yield correct results. Backtracking and re-execution are used to determine the first OpenMP parallel region that produces a difference in computation that may lead to an incorrect value the user has indicated. Our approach also lends itself to finding differences between parallel computations, where executing with M threads produces expected results but an N thread execution does not (M, N > 1, M ≠ N). OpenMP programs created using a parallelization tool are addressed by utilizing static analysis and directive information from the tool. Hand-parallelized programs, where OpenMP directives are inserted by the user, are addressed by performing data dependence and directive analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This chapter discusses the code parallelization environment, where a number of tools that address the main tasks, such as code parallelization, debugging, and optimization are available. The parallelization tools include ParaWise and CAPO, which enable the near automatic parallelization of real world scientific application codes for shared and distributed memory-based parallel systems. The chapter discusses the use of ParaWise and CAPO to transform the original serial code into an equivalent parallel code that contains appropriate OpenMP directives. Additionally, as user involvement can introduce errors, a relative debugging tool (P2d2) is also available and can be used to perform near automatic relative debugging of an OpenMP program that has been parallelized either using the tools or manually. In order for these tools to be effective in parallelizing a range of applications, a high quality fully inter-procedural dependence analysis, as well as user interaction is vital to the generation of efficient parallel code and in the optimization of the backtracking and speculation process used in relative debugging. Results of parallelized NASA codes are discussed and show the benefits of using the environment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite the apparent simplicity of the OpenMP directive shared memory programming model and the sophisticated dependence analysis and code generation capabilities of the ParaWise/CAPO tools, experience shows that a level of expertise is required to produce efficient parallel code. In a real world application the investigation of a single loop in a generated parallel code can soon become an in-depth inspection of numerous dependencies in many routines. The additional understanding of dependencies is also needed to effectively interpret the information provided and supply the required feedback. The ParaWise Expert Assistant has been developed to automate this investigation and present questions to the user about, and in the context of, their application code. In this paper, we demonstrate that knowledge of dependence information and OpenMP are no longer essential to produce efficient parallel code with the Expert Assistant. It is hoped that this will enable a far wider audience to use the tools and subsequently, exploit the benefits of large parallel systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Code parallelization using OpenMP for shared memory systems is relatively easier than using message passing for distributed memory systems. Despite this, it is still a challenge to use OpenMP to parallelize application codes in a way that yields effective scalable performance when executed on a shared memory parallel system. We describe an environment that will assist the programmer in the various tasks of code parallelization and this is achieved in a greatly reduced time frame and level of skill required. The parallelization environment includes a number of tools that address the main tasks of parallelism detection, OpenMP source code generation, debugging and optimization. These tools include a high quality, fully interprocedural dependence analysis with user interaction capabilities to facilitate the generation of efficient parallel code, an automatic relative debugging tool to identify erroneous user decisions in that interaction and also performance profiling to identify bottlenecks. Finally, experiences of parallelizing some NASA application codes are presented to illustrate some of the benefits of using the evolving environment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper briefly describes an interactive parallelisation toolkit that can be used to generate parallel code suitable for either a distributed memory system (using message passing) or a shared memory system (using OpenMP). This study focuses on how the toolkit is used to parallelise a complex heterogeneous ocean modelling code within a few hours for use on a shared memory parallel system. The generated parallel code is essentially the serial code with OpenMP directives added to express the parallelism. The results show that substantial gains in performance can be achieved over the single thread version with very little effort.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes an interactive parallelisation toolkit that can be used to generate parallel code suitable for either a distributed memory system (using message passing) or a shared memory system (using OpenMP). This study focuses on how the toolkit is used to parallelise a complex heterogeneous ocean modelling code within a few hours for use on a shared memory parallel system. The generated parallel code is essentially the serial code with OpenMP directives added to express the parallelism. The results show that substantial gains in performance can be achieved over the single thread version with very little effort.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The shared-memory programming model can be an effective way to achieve parallelism on shared memory parallel computers. Historically however, the lack of a programming standard using directives and the limited scalability have affected its take-up. Recent advances in hardware and software technologies have resulted in improvements to both the performance of parallel programs with compiler directives and the issue of portability with the introduction of OpenMP. In this study, the Computer Aided Parallelisation Toolkit has been extended to automatically generate OpenMP-based parallel programs with nominal user assistance. We categorize the different loop types and show how efficient directives can be placed using the toolkit's in-depth interprocedural analysis. Examples are taken from the NAS parallel benchmarks and a number of real-world application codes. This demonstrates the great potential of using the toolkit to quickly parallelise serial programs as well as the good performance achievable on up to 300 processors for hybrid message passing-directive parallelisations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A simulation program has been developed to calculate the power-spectral density of thin avalanche photodiodes, which are used in optical networks. The program extends the time-domain analysis of the dead-space multiplication model to compute the autocorrelation function of the APD impulse response. However, the computation requires a large amount of memory space and is very time consuming. We describe our experiences in parallelizing the code using both MPI and OpenMP. Several array partitioning schemes and scheduling policies are implemented and tested Our results show that the OpenMP code is scalable up to 64 processors on an SGI Origin 2000 machine and has small average errors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An important factor for high-speed optical communication is the availability of ultrafast and low-noise photodetectors. Among the semiconductor photodetectors that are commonly used in today’s long-haul and metro-area fiber-optic systems, avalanche photodiodes (APDs) are often preferred over p-i-n photodiodes due to their internal gain, which significantly improves the receiver sensitivity and alleviates the need for optical pre-amplification. Unfortunately, the random nature of the very process of carrier impact ionization, which generates the gain, is inherently noisy and results in fluctuations not only in the gain but also in the time response. Recently, a theory characterizing the autocorrelation function of APDs has been developed by us which incorporates the dead-space effect, an effect that is very significant in thin, high-performance APDs. The research extends the time-domain analysis of the dead-space multiplication model to compute the autocorrelation function of the APD impulse response. However, the computation requires a large amount of memory space and is very time consuming. In this research, we describe our experiences in parallelizing the code in MPI and OpenMP using CAPTools. Several array partitioning schemes and scheduling policies are implemented and tested. Our results show that the code is scalable up to 64 processors on a SGI Origin 2000 machine and has small average errors.