56 resultados para structured parallel computations


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The availability of a very accurate dependence graph for a scalar code is the basis for the automatic generation of an efficient parallel implementation. The strategy for this task which is encapsulated in a comprehensive data partitioning code generation algorithm is described. This algorithm involves the data partition, calculation of assignment ranges for partitioned arrays, addition of a comprehensive set of execution control masks, altering loop limits, addition and optimisation of communications for all data. In this context, the development and implementation of strategies to merge communications wherever possible has proved an important feature in producing efficient parallel implementations for numerical mesh based codes. The code generation strategies described here are embedded within the Computer Aided Parallelisation tools (CAPTools) software as a key part of a toolkit for automating as much as possible of the parallelisation process for mesh based numerical codes. The algorithms used enables parallelisation of real computational mechanics codes with only minor user interaction and without any prior manual customisation of the serial code to suit the parallelisation tool.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

User supplied knowledge and interaction is a vital component of a toolkit for producing high quality parallel implementations of scalar FORTRAN numerical code. In this paper we consider the necessary components that such a parallelisation toolkit should possess to provide an effective environment to identify, extract and embed user relevant user knowledge. We also examine to what extent these facilities are available in leading parallelisation tools; in particular we discuss how these issues have been addressed in the development of the user interface of the Computer Aided Parallelisation Tools (CAPTools). The CAPTools environment has been designed to enable user exploration, interaction and insertion of user knowledge to facilitate the automatic generation of very efficient parallel code. A key issue in the user's interaction is control of the volume of information so that the user is focused on only that which is needed. User control over the level and extent of information revealed at any phase is supplied using a wide variety of filters. Another issue is the way in which information is communicated. Dependence analysis and its resulting graphs involve a lot of sophisticated rather abstract concepts unlikely to be familiar to most users of parallelising tools. As such, considerable effort has been made to communicate with the user in terms that they will understand. These features, amongst others, and their use in the parallelisation process are described and their effectiveness discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A parallel method for the dynamic partitioning of unstructured meshes is described. The method introduces a new iterative optimization technique known as relative gain optimization which both balances the workload and attempts to minimize the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more rapidly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the exploitation of overlapping communication with calculation within parallel FORTRAN 77 codes for computational fluid dynamics (CFD) and computational structured dynamics (CSD). The obvious objective is to overlap interprocessor communication with calculation on each processor in a distributed memory parallel system and so improve the efficiency of the parallel implementation. A general strategy for converting synchronous to overlapped communication is presented together with tools to enable its automatic implementation in FORTRAN 77 codes. This strategy is then implemented within the parallelisation toolkit, CAPTools, to facilitate the automatic generation of parallel code with overlapped communications. The success of these tools are demonstrated on two codes from the NAS-PAR and PERFECT benchmark suites. In each case, the tools produce parallel code with overlapped communications which is as good as that which could be generated manually. The parallel performance of the codes also improve in line with expectation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work is concerned with the development of a numerical scheme capable of producing accurate simulations of sound propagation in the presence of a mean flow field. The method is based on the concept of variable decomposition, which leads to two separate sets of equations. These equations are the linearised Euler equations and the Reynolds-averaged Navier–Stokes equations. This paper concentrates on the development of numerical schemes for the linearised Euler equations that leads to a computational aeroacoustics (CAA) code. The resulting CAA code is a non-diffusive, time- and space-staggered finite volume code for the acoustic perturbation, and it is validated against analytic results for pure 1D sound propagation and 2D benchmark problems involving sound scattering from a cylindrical obstacle. Predictions are also given for the case of prescribed source sound propagation in a laminar boundary layer as an illustration of the effects of mean convection. Copyright © 1999 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The manual effort required to convert sequential computational mechanics programs into a useful, scalable parallel form is considerable. Tools that can assist in the conversion process are clearly required. Computer aided parallelisation tools (CAPTools) have been developed to generate efficient parallel code for real world structured grid application codes such as Computational Fluid Dynamics. Automatable single-program multi-data (SPMD) overlapping domain decomposition (DD) techniques established for structured grid codes have been adapted by the authors to manually parallelise unstructured mesh applications. Inspector loops have been used to provide generic techniques for the run-time support necessary to extend the capabilities of CAPTools to automatic implementation of SPMD DD techniques in the parallelisation of unstructured mesh codes. Copyright © 1999 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The demands of the process of engineering design, particularly for structural integrity, have exploited computational modelling techniques and software tools for decades. Frequently, the shape of structural components or assemblies is determined to optimise the flow distribution or heat transfer characteristics, and to ensure that the structural performance in service is adequate. From the perspective of computational modelling these activities are typically separated into: • fluid flow and the associated heat transfer analysis (possibly with chemical reactions), based upon Computational Fluid Dynamics (CFD) technology • structural analysis again possibly with heat transfer, based upon finite element analysis (FEA) techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report on practical experience using the Oxford BSP Library to parallelize a large electromagnetic code, the British Aerospace finite-difference time-domain code EMMA T:FD3D. The Oxford BS Library is one of the first realizations of the Bulk Synchronous Parallel computational model to be targeted at numerically intensive scientific (typically Fortran) computing. The BAe EMMA code is one of the first large-scale applications to be parallelized using this library, and it is an important demonstration of the cost effectiveness of the BSP approach. We illustrate how BSP cost-modelling techniques can be used to predict and optimize performance for single-source programs across different parallel platforms. We provide predicted and observed performance figures for an industrial-strength, single-source parallel code for a variety of real parallel architectures: shared memory multiprocessors, workstation clusters and massively parallel platforms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper the results obtained from the parallelisation of some 3D industrial electromagnetic Finite Element codes within the ESPRIT Europort 2 project PARTEL are presented. The basic guidelines for the parallelisation procedure, based on the Bulk Synchronous Parallel approach, are presented and the encouraging results obtained in terms of speed-up on some selected test cases of practical design significance are outlined and discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper examines scheduling problems in which the setup phase of each operation needs to be attended by a single server, common for all jobs and different from the processing machines. The objective in each situation is to minimize the makespan. For the processing system consisting of two parallel dedicated machines we prove that the problem of finding an optimal schedule is NP-hard in the strong sense even if all setup times are equal or if all processing times are equal. For the case of m parallel dedicated machines, a simple greedy algorithm is shown to create a schedule with the makespan that is at most twice the optimum value. For the two machine case, an improved heuristic guarantees a tight worst-case ratio of 3/2. We also describe several polynomially solvable cases of the later problem. The two-machine flow shop and the open shop problems with a single server are also shown to be NP-hard in the strong sense. However, we reduce the two-machine flow shop no-wait problem with a single server to the Gilmore-Gomory traveling salesman problem and solve it in polynomial time. (c) 2000 John Wiley & Sons, Inc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a dynamic distributed load balancing algorithm for parallel, adaptive Finite Element simulations in which we use preconditioned Conjugate Gradient solvers based on domain-decomposition. The load balancing is designed to maintain good partition aspect ratio and we show that cut size is not always the appropriate measure in load balancing. Furthermore, we attempt to answer the question why the aspect ratio of partitions plays an important role for certain solvers. We define and rate different kinds of aspect ratio and present a new center-based partitioning method of calculating the initial distribution which implicitly optimizes this measure. During the adaptive simulation, the load balancer calculates a balancing flow using different versions of the diffusion algorithm and a variant of breadth first search. Elements to be migrated are chosen according to a cost function aiming at the optimization of subdomain shapes. Experimental results for Bramble's preconditioner and comparisons to state-of-the-art load balancers show the benefits of the construction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Three parallel optimisation algorithms, for use in the context of multilevel graph partitioning of unstructured meshes, are described. The first, interface optimisation, reduces the computation to a set of independent optimisation problems in interface regions. The next, alternating optimisation, is a restriction of this technique in which mesh entities are only allowed to migrate between subdomains in one direction. The third treats the gain as a potential field and uses the concept of relative gain for selecting appropriate vertices to migrate. The results are compared and seen to produce very high global quality partitions, very rapidly. The results are also compared with another partitioning tool and shown to be of higher quality although taking longer to compute.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The central product of the DRAMA (Dynamic Re-Allocation of Meshes for parallel Finite Element Applications) project is a library comprising a variety of tools for dynamic re-partitioning of unstructured Finite Element (FE) applications. The input to the DRAMA library is the computational mesh, and corresponding costs, partitioned into sub-domains. The core library functions then perform a parallel computation of a mesh re-allocation that will re-balance the costs based on the DRAMA cost model. We discuss the basic features of this cost model, which allows a general approach to load identification, modelling and imbalance minimisation. Results from crash simulations are presented which show the necessity for multi-phase/multi-constraint partitioning components