34 resultados para Parallel computing

em Greenwich Academic Literature Archive - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We report on practical experience using the Oxford BSP Library to parallelize a large electromagnetic code, the British Aerospace finite-difference time-domain code EMMA T:FD3D. The Oxford BS Library is one of the first realizations of the Bulk Synchronous Parallel computational model to be targeted at numerically intensive scientific (typically Fortran) computing. The BAe EMMA code is one of the first large-scale applications to be parallelized using this library, and it is an important demonstration of the cost effectiveness of the BSP approach. We illustrate how BSP cost-modelling techniques can be used to predict and optimize performance for single-source programs across different parallel platforms. We provide predicted and observed performance figures for an industrial-strength, single-source parallel code for a variety of real parallel architectures: shared memory multiprocessors, workstation clusters and massively parallel platforms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The availability of a very accurate dependence graph for a scalar code is the basis for the automatic generation of an efficient parallel implementation. The strategy for this task which is encapsulated in a comprehensive data partitioning code generation algorithm is described. This algorithm involves the data partition, calculation of assignment ranges for partitioned arrays, addition of a comprehensive set of execution control masks, altering loop limits, addition and optimisation of communications for all data. In this context, the development and implementation of strategies to merge communications wherever possible has proved an important feature in producing efficient parallel implementations for numerical mesh based codes. The code generation strategies described here are embedded within the Computer Aided Parallelisation tools (CAPTools) software as a key part of a toolkit for automating as much as possible of the parallelisation process for mesh based numerical codes. The algorithms used enables parallelisation of real computational mechanics codes with only minor user interaction and without any prior manual customisation of the serial code to suit the parallelisation tool.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

User supplied knowledge and interaction is a vital component of a toolkit for producing high quality parallel implementations of scalar FORTRAN numerical code. In this paper we consider the necessary components that such a parallelisation toolkit should possess to provide an effective environment to identify, extract and embed user relevant user knowledge. We also examine to what extent these facilities are available in leading parallelisation tools; in particular we discuss how these issues have been addressed in the development of the user interface of the Computer Aided Parallelisation Tools (CAPTools). The CAPTools environment has been designed to enable user exploration, interaction and insertion of user knowledge to facilitate the automatic generation of very efficient parallel code. A key issue in the user's interaction is control of the volume of information so that the user is focused on only that which is needed. User control over the level and extent of information revealed at any phase is supplied using a wide variety of filters. Another issue is the way in which information is communicated. Dependence analysis and its resulting graphs involve a lot of sophisticated rather abstract concepts unlikely to be familiar to most users of parallelising tools. As such, considerable effort has been made to communicate with the user in terms that they will understand. These features, amongst others, and their use in the parallelisation process are described and their effectiveness discussed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Parallel computing is now widely used in numerical simulation, particularly for application codes based on finite difference and finite element methods. A popular and successful technique employed to parallelize such codes onto large distributed memory systems is to partition the mesh into sub-domains that are then allocated to processors. The code then executes in parallel, using the SPMD methodology, with message passing for inter-processor interactions. In order to improve the parallel efficiency of an imbalanced structured mesh CFD code, a new dynamic load balancing (DLB) strategy has been developed in which the processor partition range limits of just one of the partitioned dimensions uses non-coincidental limits, as opposed to coincidental limits. The ‘local’ partition limit change allows greater flexibility in obtaining a balanced load distribution, as the workload increase, or decrease, on a processor is no longer restricted by the ‘global’ (coincidental) limit change. The automatic implementation of this generic DLB strategy within an existing parallel code is presented in this chapter, along with some preliminary results.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present a dynamic distributed load balancing algorithm for parallel, adaptive Finite Element simulations in which we use preconditioned Conjugate Gradient solvers based on domain-decomposition. The load balancing is designed to maintain good partition aspect ratio and we show that cut size is not always the appropriate measure in load balancing. Furthermore, we attempt to answer the question why the aspect ratio of partitions plays an important role for certain solvers. We define and rate different kinds of aspect ratio and present a new center-based partitioning method of calculating the initial distribution which implicitly optimizes this measure. During the adaptive simulation, the load balancer calculates a balancing flow using different versions of the diffusion algorithm and a variant of breadth first search. Elements to be migrated are chosen according to a cost function aiming at the optimization of subdomain shapes. Experimental results for Bramble's preconditioner and comparisons to state-of-the-art load balancers show the benefits of the construction.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Three parallel optimisation algorithms, for use in the context of multilevel graph partitioning of unstructured meshes, are described. The first, interface optimisation, reduces the computation to a set of independent optimisation problems in interface regions. The next, alternating optimisation, is a restriction of this technique in which mesh entities are only allowed to migrate between subdomains in one direction. The third treats the gain as a potential field and uses the concept of relative gain for selecting appropriate vertices to migrate. The results are compared and seen to produce very high global quality partitions, very rapidly. The results are also compared with another partitioning tool and shown to be of higher quality although taking longer to compute.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Computer Aided Parallelisation Tools (CAPTools) [Ierotheou, C, Johnson SP, Cross M, Leggett PF, Computer aided parallelisation tools (CAPTools)-conceptual overview and performance on the parallelisation of structured mesh codes, Parallel Computing, 1996;22:163±195] is a set of interactive tools aimed to provide automatic parallelisation of serial FORTRAN Computational Mechanics (CM) programs. CAPTools analyses the user's serial code and then through stages of array partitioning, mask and communication calculation, generates parallel SPMD (Single Program Multiple Data) messages passing FORTRAN. The parallel code generated by CAPTools contains calls to a collection of routines that form the CAPTools communications Library (CAPLib). The library provides a portable layer and user friendly abstraction over the underlying parallel environment. CAPLib contains optimised message passing routines for data exchange between parallel processes and other utility routines for parallel execution control, initialisation and debugging. By compiling and linking with different implementations of the library, the user is able to run on many different parallel environments. Even with today's parallel systems the concept of a single version of a parallel application code is more of an aspiration than a reality. However for CM codes the data partitioning SPMD paradigm requires a relatively small set of message-passing communication calls. This set can be implemented as an intermediate `thin layer' library of message-passing calls that enables the parallel code (especially that generated automatically by a parallelisation tool such as CAPTools) to be as generic as possible. CAPLib is just such a `thin layer' message passing library that supports parallel CM codes, by mapping generic calls onto machine specific libraries (such as CRAY SHMEM) and portable general purpose libraries (such as PVM an MPI). This paper describe CAPLib together with its three perceived advantages over other routes: - as a high level abstraction, it is both easy to understand (especially when generated automatically by tools) and to implement by hand, for the CM community (who are not generally parallel computing specialists); - the one parallel version of the application code is truly generic and portable; - the parallel application can readily utilise whatever message passing libraries on a given machine yield optimum performance.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The central product of the DRAMA (Dynamic Re-Allocation of Meshes for parallel Finite Element Applications) project is a library comprising a variety of tools for dynamic re-partitioning of unstructured Finite Element (FE) applications. The input to the DRAMA library is the computational mesh, and corresponding costs, partitioned into sub-domains. The core library functions then perform a parallel computation of a mesh re-allocation that will re-balance the costs based on the DRAMA cost model. We discuss the basic features of this cost model, which allows a general approach to load identification, modelling and imbalance minimisation. Results from crash simulations are presented which show the necessity for multi-phase/multi-constraint partitioning components

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we first demonstrate that the classical Purcell's vector method when combined with row pivoting yields a consistently small growth factor in comparison to the well-known Gauss elimination method, the Gauss–Jordan method and the Gauss–Huard method with partial pivoting. We then present six parallel algorithms of the Purcell method that may be used for direct solution of linear systems. The algorithms differ in ways of pivoting and load balancing. We recommend algorithms V and VI for their reliability and algorithms III and IV for good load balance if local pivoting is acceptable. Some numerical results are presented.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

As the complexity of parallel applications increase, the performance limitations resulting from computational load imbalance become dominant. Mapping the problem space to the processors in a parallel machine in a manner that balances the workload of each processors will typically reduce the run-time. In many cases the computation time required for a given calculation cannot be predetermined even at run-time and so static partition of the problem returns poor performance. For problems in which the computational load across the discretisation is dynamic and inhomogeneous, for example multi-physics problems involving fluid and solid mechanics with phase changes, the workload for a static subdomain will change over the course of a computation and cannot be estimated beforehand. For such applications the mapping of loads to process is required to change dynamically, at run-time in order to maintain reasonable efficiency. The issue of dynamic load balancing are examined in the context of PHYSICA, a three dimensional unstructured mesh multi-physics continuum mechanics computational modelling code.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The central product of the DRAMA (Dynamic Re-Allocation of Meshes for parallel Finite Element Applications) project is a library comprising a variety of tools for dynamic re-partitioning of unstructured Finite Element (FE) applications. The input to the DRAMA library is the computational mesh, and corresponding costs, partitioned into sub-domains. The core library functions then perform a parallel computation of a mesh re-allocation that will re-balance the costs based on the DRAMA cost model. We discuss the basic features of this cost model, which allows a general approach to load identification, modelling and imbalance minimisation. Results from crash simulations are presented which show the necessity for multi-phase/multi-constraint partitioning components.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Computer Aided Parallelisation Tools (CAPTools) is a toolkit designed to automate as much as possible of the process of parallelising scalar FORTRAN 77 codes. The toolkit combines a very powerful dependence analysis together with user supplied knowledge to build an extremely comprehensive and accurate dependence graph. The initial version has been targeted at structured mesh computational mechanics codes (eg. heat transfer, Computational Fluid Dynamics (CFD)) and the associated simple mesh decomposition paradigm is utilised in the automatic code partition, execution control mask generation and communication call insertion. In this, the first of a series of papers [1–3] the authors discuss the parallelisations of a number of case study codes showing how the various component tools may be used to develop a highly efficient parallel implementation in a few hours or days. The details of the parallelisation of the TEAMKE1 CFD code are described together with the results of three other numerical codes. The resulting parallel implementations are then tested on workstation clusters using PVM and an i860-based parallel system showing efficiencies well over 80%.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The requirement for a very accurate dependence analysis to underpin software tools to aid the generation of efficient parallel implementations of scalar code is argued. The current status of dependence analysis is shown to be inadequate for the generation of efficient parallel code, causing too many conservative assumptions to be made. This paper summarises the limitations of conventional dependence analysis techniques, and then describes a series of extensions which enable the production of a much more accurate dependence graph. The extensions include analysis of symbolic variables, the development of a symbolic inequality disproof algorithm and its exploitation in a symbolic Banerjee inequality test; the use of inference engine proofs; the exploitation of exact dependence and dependence pre-domination attributes; interprocedural array analysis; conditional variable definition tracing; integer array tracing and division calculations. Analysis case studies on typical numerical code is shown to reduce the total dependencies estimated from conventional analysis by up to 50%. The techniques described in this paper have been embedded within a suite of tools, CAPTools, which combines analysis with user knowledge to produce efficient parallel implementations of numerical mesh based codes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper addresses the exploitation of overlapping communication with calculation within parallel FORTRAN 77 codes for computational fluid dynamics (CFD) and computational structured dynamics (CSD). The obvious objective is to overlap interprocessor communication with calculation on each processor in a distributed memory parallel system and so improve the efficiency of the parallel implementation. A general strategy for converting synchronous to overlapped communication is presented together with tools to enable its automatic implementation in FORTRAN 77 codes. This strategy is then implemented within the parallelisation toolkit, CAPTools, to facilitate the automatic generation of parallel code with overlapped communications. The success of these tools are demonstrated on two codes from the NAS-PAR and PERFECT benchmark suites. In each case, the tools produce parallel code with overlapped communications which is as good as that which could be generated manually. The parallel performance of the codes also improve in line with expectation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The most common parallelisation strategy for many Computational Mechanics (CM) (typified by Computational Fluid Dynamics (CFD) applications) which use structured meshes, involves a 1D partition based upon slabs of cells. However, many CFD codes employ pipeline operations in their solution procedure. For parallelised versions of such codes to scale well they must employ two (or more) dimensional partitions. This paper describes an algorithmic approach to the multi-dimensional mesh partitioning in code parallelisation, its implementation in a toolkit for almost automatically transforming scalar codes to parallel form, and its testing on a range of ‘real-world’ FORTRAN codes. The concept of multi-dimensional partitioning is straightforward, but non-trivial to represent as a sufficiently generic algorithm so that it can be embedded in a code transformation tool. The results of the tests on fine real-world codes demonstrate clear improvements in parallel performance and scalability (over a 1D partition). This is matched by a huge reduction in the time required to develop the parallel versions when hand coded – from weeks/months down to hours/days.