56 resultados para structured parallel computations

em Greenwich Academic Literature Archive - UK


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this work we show how automatic relative debugging can be used to find differences in computation between a correct serial program and an OpenMP parallel version of that program that does not yield correct results. Backtracking and re-execution are used to determine the first OpenMP parallel region that produces a difference in computation that may lead to an incorrect value the user has indicated. Our approach also lends itself to finding differences between parallel computations, where executing with M threads produces expected results but an N thread execution does not (M, N > 1, M ≠ N). OpenMP programs created using a parallelization tool are addressed by utilizing static analysis and directive information from the tool. Hand-parallelized programs, where OpenMP directives are inserted by the user, are addressed by performing data dependence and directive analysis.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Parallel computing is now widely used in numerical simulation, particularly for application codes based on finite difference and finite element methods. A popular and successful technique employed to parallelize such codes onto large distributed memory systems is to partition the mesh into sub-domains that are then allocated to processors. The code then executes in parallel, using the SPMD methodology, with message passing for inter-processor interactions. In order to improve the parallel efficiency of an imbalanced structured mesh CFD code, a new dynamic load balancing (DLB) strategy has been developed in which the processor partition range limits of just one of the partitioned dimensions uses non-coincidental limits, as opposed to coincidental limits. The ‘local’ partition limit change allows greater flexibility in obtaining a balanced load distribution, as the workload increase, or decrease, on a processor is no longer restricted by the ‘global’ (coincidental) limit change. The automatic implementation of this generic DLB strategy within an existing parallel code is presented in this chapter, along with some preliminary results.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

General-purpose parallel processing for solving day-to-day industrial problems has been slow to develop, partly because of the lack of suitable hardware from well-established, mainstream computer manufacturers and suitably parallelized application software. The parallelization of a CFD-(computational fluid dynamics) flow solution code is known as ESAUNA. This code is part of SAUNA, a large CFD suite aimed at computing the flow around very complex aircraft configurations including complete aircraft. A novel feature of the SAUNA suite is that it is designed to use either block-structured hexahedral grids, unstructured tetrahedral grids, or a hybrid combination of both grid types. ESAUNA is designed to solve the Euler equations or the Navier-Stokes equations, the latter in conjunction with various turbulence models. Two fundamental parallelization concepts are used—namely, grid partitioning and encapsulation of communications. Grid partitioning is applied to both block-structured grid modules and unstructured grid modules. ESAUNA can also be coupled with other simulation codes for multidisciplinary computations such as flow simulations around an aircraft coupled with flutter prediction for transient flight simulations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Computer Aided Parallelisation Tools (CAPTools) is a toolkit designed to automate as much as possible of the process of parallelising scalar FORTRAN 77 codes. The toolkit combines a very powerful dependence analysis together with user supplied knowledge to build an extremely comprehensive and accurate dependence graph. The initial version has been targeted at structured mesh computational mechanics codes (eg. heat transfer, Computational Fluid Dynamics (CFD)) and the associated simple mesh decomposition paradigm is utilised in the automatic code partition, execution control mask generation and communication call insertion. In this, the first of a series of papers [1–3] the authors discuss the parallelisations of a number of case study codes showing how the various component tools may be used to develop a highly efficient parallel implementation in a few hours or days. The details of the parallelisation of the TEAMKE1 CFD code are described together with the results of three other numerical codes. The resulting parallel implementations are then tested on workstation clusters using PVM and an i860-based parallel system showing efficiencies well over 80%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The most common parallelisation strategy for many Computational Mechanics (CM) (typified by Computational Fluid Dynamics (CFD) applications) which use structured meshes, involves a 1D partition based upon slabs of cells. However, many CFD codes employ pipeline operations in their solution procedure. For parallelised versions of such codes to scale well they must employ two (or more) dimensional partitions. This paper describes an algorithmic approach to the multi-dimensional mesh partitioning in code parallelisation, its implementation in a toolkit for almost automatically transforming scalar codes to parallel form, and its testing on a range of ‘real-world’ FORTRAN codes. The concept of multi-dimensional partitioning is straightforward, but non-trivial to represent as a sufficiently generic algorithm so that it can be embedded in a code transformation tool. The results of the tests on fine real-world codes demonstrate clear improvements in parallel performance and scalability (over a 1D partition). This is matched by a huge reduction in the time required to develop the parallel versions when hand coded – from weeks/months down to hours/days.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier–Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright © 2000 John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Three paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation -- a nonlinear, structured-grid partial differential equation boundary value problem -- using the same algorithm on the same hardware. All of the paradigms -- parallel languages represented by the Portland Group's HPF, (semi-)automated serial-to-parallel source-to-source translation represented by CAP-Tools from the University of Greenwich, and parallel libraries represented by Argonne's PETSc -- are found to be easy to use for this problem class, and all are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under any paradigm includes specification of the data partitioning, corresponding to a geometrically simple decomposition of the domain of the PDE. Programming in SPMD style for the PETSc library requires writing only the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global-to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm as a starting point, introduction of concurrency through subdomain blocking (a task similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Programming with CAPTools involves feeding the same sequential implementation to the CAPTools interactive parallelization system, and guiding the source-to-source code transformation by responding to various queries about quantities knowable only at runtime. Results representative of "the state of the practice" for a scaled sequence of structured grid problems are given on three of the most important contemporary high-performance platforms: the IBM SP, the SGI Origin 2000, and the CRAYY T3E.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Computer Aided Parallelisation Tools (CAPTools) [Ierotheou, C, Johnson SP, Cross M, Leggett PF, Computer aided parallelisation tools (CAPTools)-conceptual overview and performance on the parallelisation of structured mesh codes, Parallel Computing, 1996;22:163±195] is a set of interactive tools aimed to provide automatic parallelisation of serial FORTRAN Computational Mechanics (CM) programs. CAPTools analyses the user's serial code and then through stages of array partitioning, mask and communication calculation, generates parallel SPMD (Single Program Multiple Data) messages passing FORTRAN. The parallel code generated by CAPTools contains calls to a collection of routines that form the CAPTools communications Library (CAPLib). The library provides a portable layer and user friendly abstraction over the underlying parallel environment. CAPLib contains optimised message passing routines for data exchange between parallel processes and other utility routines for parallel execution control, initialisation and debugging. By compiling and linking with different implementations of the library, the user is able to run on many different parallel environments. Even with today's parallel systems the concept of a single version of a parallel application code is more of an aspiration than a reality. However for CM codes the data partitioning SPMD paradigm requires a relatively small set of message-passing communication calls. This set can be implemented as an intermediate `thin layer' library of message-passing calls that enables the parallel code (especially that generated automatically by a parallelisation tool such as CAPTools) to be as generic as possible. CAPLib is just such a `thin layer' message passing library that supports parallel CM codes, by mapping generic calls onto machine specific libraries (such as CRAY SHMEM) and portable general purpose libraries (such as PVM an MPI). This paper describe CAPLib together with its three perceived advantages over other routes: - as a high level abstraction, it is both easy to understand (especially when generated automatically by tools) and to implement by hand, for the CM community (who are not generally parallel computing specialists); - the one parallel version of the application code is truly generic and portable; - the parallel application can readily utilise whatever message passing libraries on a given machine yield optimum performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper provides an overview of the developing needs for simulation software technologies for the computational modelling of problems that involve combinations of interactions amongst varying physical phenomena over a variety of time and space scales. Computational modelling of such problems requires software tech1nologies that enable the mathematical description of the interacting physical phenomena together with the solution of the resulting suites of equations in a numerically consistent and compatible manner. This functionality requires the structuring of simulation modules for specific physical phenomena so that the coupling can be effectively represented. These multi-physics and multi-scale computations are very compute intensive and the simulation software must operate effectively in parallel if it is to be used in this context. An approach to these classes of multi-disciplinary simulation in parallel is described, with some key examples of application to2 challenging engineering problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A large class of computational problems are characterised by frequent synchronisation, and computational requirements which change as a function of time. When such a problem is solved on a message passing multiprocessor machine [5], the combination of these characteristics leads to system performance which deteriorate in time. As the communication performance of parallel hardware steadily improves so load balance becomes a dominant factor in obtaining high parallel efficiency. Performance can be improved with periodic redistribution of computational load; however, redistribution can sometimes be very costly. We study the issue of deciding when to invoke a global load re-balancing mechanism. Such a decision policy must actively weigh the costs of remapping against the performance benefits, and should be general enough to apply automatically to a wide range of computations. This paper discusses a generic strategy for Dynamic Load Balancing (DLB) in unstructured mesh computational mechanics applications. The strategy is intended to handle varying levels of load changes throughout the run. The major issues involved in a generic dynamic load balancing scheme will be investigated together with techniques to automate the implementation of a dynamic load balancing mechanism within the Computer Aided Parallelisation Tools (CAPTools) environment, which is a semi-automatic tool for parallelisation of mesh based FORTRAN codes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A method is outlined for optimising graph partitions which arise in mapping un- structured mesh calculations to parallel computers. The method employs a combination of iterative techniques to both evenly balance the workload and minimise the number and volume of interprocessor communications. They are designed to work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. The algorithms can also be used for dynamic load-balancing and a clustering technique can additionally be employed to speed up the whole process. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many areas of simulation, a crucial component for efficient numerical computations is the use of solution-driven adaptive features: locally adapted meshing or re-meshing; dynamically changing computational tasks. The full advantages of high performance computing (HPC) technology will thus only be able to be exploited when efficient parallel adaptive solvers can be realised. The resulting requirement for HPC software is for dynamic load balancing, which for many mesh-based applications means dynamic mesh re-partitioning. The DRAMA project has been initiated to address this issue, with a particular focus being the requirements of industrial Finite Element codes, but codes using Finite Volume formulations will also be able to make use of the project results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a dynamic distributed load balancing algorithm for parallel, adaptive finite element simulations using preconditioned conjugate gradient solvers based on domain-decomposition. The load balancer is designed to maintain good partition aspect ratios. It can calculate a balancing flow using different versions of diffusion and a variant of breadth first search. Elements to be migrated are chosen according to a cost function aiming at the optimization of subdomain shapes. We show how to use information from the second step to guide the first. Experimental results using Bramble's preconditioner and comparisons to existing state-ot-the-art load balancers show the benefits of the construction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a new dynamic load balancing technique for structured mesh computational mechanics codes in which the processor partition range limits of just one of the partitioned dimensions uses non-coincidental limits, as opposed to using coincidental limits in all of the partitioned dimensions. The partition range limits are 'staggered', allowing greater flexibility in obtaining a balanced load distribution in comparison to when the limits are changed 'globally'. as the load increase/decrease on one processor no longer restricts the load decrease/increase on a neighbouring processor. The automatic implementation of this 'staggered' load balancing strategy within an existing parallel code is presented in this paper, along with some preliminary results.