966 resultados para scalable parallel programming
Resumo:
User supplied knowledge and interaction is a vital component of a toolkit for producing high quality parallel implementations of scalar FORTRAN numerical code. In this paper we consider the necessary components that such a parallelisation toolkit should possess to provide an effective environment to identify, extract and embed user relevant user knowledge. We also examine to what extent these facilities are available in leading parallelisation tools; in particular we discuss how these issues have been addressed in the development of the user interface of the Computer Aided Parallelisation Tools (CAPTools). The CAPTools environment has been designed to enable user exploration, interaction and insertion of user knowledge to facilitate the automatic generation of very efficient parallel code. A key issue in the user's interaction is control of the volume of information so that the user is focused on only that which is needed. User control over the level and extent of information revealed at any phase is supplied using a wide variety of filters. Another issue is the way in which information is communicated. Dependence analysis and its resulting graphs involve a lot of sophisticated rather abstract concepts unlikely to be familiar to most users of parallelising tools. As such, considerable effort has been made to communicate with the user in terms that they will understand. These features, amongst others, and their use in the parallelisation process are described and their effectiveness discussed.
Resumo:
Of key importance to oil and gas companies is the size distribution of fields in the areas that they are drilling. Recent arguments suggest that there are many more fields yet to be discovered in mature provinces than had previously been thought because the underlying distribution is monotonic not peaked. According to this view the peaked nature of the distribution for discovered fields reflects not the underlying distribution but the effect of economic truncation. This paper contributes to the discussion by analysing up-to-date exploration and discovery data for two mature provinces using the discovery-process model, based on sampling without replacement and implicitly including economic truncation effects. The maximum likelihood estimation involved generates a high-dimensional mixed-integer nonlinear optimization problem. A highly efficient solution strategy is tested, exploiting the separable structure and handling the integer constraints by treating the problem as a masked allocation problem in dynamic programming.
Resumo:
A parallel method for the dynamic partitioning of unstructured meshes is described. The method introduces a new iterative optimization technique known as relative gain optimization which both balances the workload and attempts to minimize the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more rapidly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.
Resumo:
Realizing scalable performance on high performance computing systems is not straightforward for single-phenomenon codes (such as computational fluid dynamics [CFD]). This task is magnified considerably when the target software involves the interactions of a range of phenomena that have distinctive solution procedures involving different discretization methods. The problems of addressing the key issues of retaining data integrity and the ordering of the calculation procedures are significant. A strategy for parallelizing this multiphysics family of codes is described for software exploiting finite-volume discretization methods on unstructured meshes using iterative solution procedures. A mesh partitioning-based SPMD approach is used. However, since different variables use distinct discretization schemes, this means that distinct partitions are required; techniques for addressing this issue are described using the mesh-partitioning tool, JOSTLE. In this contribution, the strategy is tested for a variety of test cases under a wide range of conditions (e.g., problem size, number of processors, asynchronous / synchronous communications, etc.) using a variety of strategies for mapping the mesh partition onto the processor topology.
Resumo:
Parallel computing is now widely used in numerical simulation, particularly for application codes based on finite difference and finite element methods. A popular and successful technique employed to parallelize such codes onto large distributed memory systems is to partition the mesh into sub-domains that are then allocated to processors. The code then executes in parallel, using the SPMD methodology, with message passing for inter-processor interactions. In order to improve the parallel efficiency of an imbalanced structured mesh CFD code, a new dynamic load balancing (DLB) strategy has been developed in which the processor partition range limits of just one of the partitioned dimensions uses non-coincidental limits, as opposed to coincidental limits. The ‘local’ partition limit change allows greater flexibility in obtaining a balanced load distribution, as the workload increase, or decrease, on a processor is no longer restricted by the ‘global’ (coincidental) limit change. The automatic implementation of this generic DLB strategy within an existing parallel code is presented in this chapter, along with some preliminary results.
Resumo:
The demands of the process of engineering design, particularly for structural integrity, have exploited computational modelling techniques and software tools for decades. Frequently, the shape of structural components or assemblies is determined to optimise the flow distribution or heat transfer characteristics, and to ensure that the structural performance in service is adequate. From the perspective of computational modelling these activities are typically separated into: • fluid flow and the associated heat transfer analysis (possibly with chemical reactions), based upon Computational Fluid Dynamics (CFD) technology • structural analysis again possibly with heat transfer, based upon finite element analysis (FEA) techniques.
Resumo:
The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier–Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright © 2000 John Wiley & Sons, Ltd.
Resumo:
We report on practical experience using the Oxford BSP Library to parallelize a large electromagnetic code, the British Aerospace finite-difference time-domain code EMMA T:FD3D. The Oxford BS Library is one of the first realizations of the Bulk Synchronous Parallel computational model to be targeted at numerically intensive scientific (typically Fortran) computing. The BAe EMMA code is one of the first large-scale applications to be parallelized using this library, and it is an important demonstration of the cost effectiveness of the BSP approach. We illustrate how BSP cost-modelling techniques can be used to predict and optimize performance for single-source programs across different parallel platforms. We provide predicted and observed performance figures for an industrial-strength, single-source parallel code for a variety of real parallel architectures: shared memory multiprocessors, workstation clusters and massively parallel platforms.
Resumo:
In this paper the results obtained from the parallelisation of some 3D industrial electromagnetic Finite Element codes within the ESPRIT Europort 2 project PARTEL are presented. The basic guidelines for the parallelisation procedure, based on the Bulk Synchronous Parallel approach, are presented and the encouraging results obtained in terms of speed-up on some selected test cases of practical design significance are outlined and discussed.
Resumo:
This paper examines scheduling problems in which the setup phase of each operation needs to be attended by a single server, common for all jobs and different from the processing machines. The objective in each situation is to minimize the makespan. For the processing system consisting of two parallel dedicated machines we prove that the problem of finding an optimal schedule is NP-hard in the strong sense even if all setup times are equal or if all processing times are equal. For the case of m parallel dedicated machines, a simple greedy algorithm is shown to create a schedule with the makespan that is at most twice the optimum value. For the two machine case, an improved heuristic guarantees a tight worst-case ratio of 3/2. We also describe several polynomially solvable cases of the later problem. The two-machine flow shop and the open shop problems with a single server are also shown to be NP-hard in the strong sense. However, we reduce the two-machine flow shop no-wait problem with a single server to the Gilmore-Gomory traveling salesman problem and solve it in polynomial time. (c) 2000 John Wiley & Sons, Inc.
Resumo:
We present a dynamic distributed load balancing algorithm for parallel, adaptive Finite Element simulations in which we use preconditioned Conjugate Gradient solvers based on domain-decomposition. The load balancing is designed to maintain good partition aspect ratio and we show that cut size is not always the appropriate measure in load balancing. Furthermore, we attempt to answer the question why the aspect ratio of partitions plays an important role for certain solvers. We define and rate different kinds of aspect ratio and present a new center-based partitioning method of calculating the initial distribution which implicitly optimizes this measure. During the adaptive simulation, the load balancer calculates a balancing flow using different versions of the diffusion algorithm and a variant of breadth first search. Elements to be migrated are chosen according to a cost function aiming at the optimization of subdomain shapes. Experimental results for Bramble's preconditioner and comparisons to state-of-the-art load balancers show the benefits of the construction.
Resumo:
Three parallel optimisation algorithms, for use in the context of multilevel graph partitioning of unstructured meshes, are described. The first, interface optimisation, reduces the computation to a set of independent optimisation problems in interface regions. The next, alternating optimisation, is a restriction of this technique in which mesh entities are only allowed to migrate between subdomains in one direction. The third treats the gain as a potential field and uses the concept of relative gain for selecting appropriate vertices to migrate. The results are compared and seen to produce very high global quality partitions, very rapidly. The results are also compared with another partitioning tool and shown to be of higher quality although taking longer to compute.