955 resultados para Partition graphique
Resumo:
The availability of a very accurate dependence graph for a scalar code is the basis for the automatic generation of an efficient parallel implementation. The strategy for this task which is encapsulated in a comprehensive data partitioning code generation algorithm is described. This algorithm involves the data partition, calculation of assignment ranges for partitioned arrays, addition of a comprehensive set of execution control masks, altering loop limits, addition and optimisation of communications for all data. In this context, the development and implementation of strategies to merge communications wherever possible has proved an important feature in producing efficient parallel implementations for numerical mesh based codes. The code generation strategies described here are embedded within the Computer Aided Parallelisation tools (CAPTools) software as a key part of a toolkit for automating as much as possible of the parallelisation process for mesh based numerical codes. The algorithms used enables parallelisation of real computational mechanics codes with only minor user interaction and without any prior manual customisation of the serial code to suit the parallelisation tool.
Resumo:
A parallel method for the dynamic partitioning of unstructured meshes is described. The method introduces a new iterative optimization technique known as relative gain optimization which both balances the workload and attempts to minimize the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more rapidly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.
Resumo:
Multilevel algorithms are a successful class of optimization techniques that address the mesh partitioning problem for mapping meshes onto parallel computers. They usually combine a graph contraction algorithm together with a local optimization method that refines the partition at each graph level. To date, these algorithms have been used almost exclusively to minimize the cut-edge weight in the graph with the aim of minimizing the parallel communication overhead. However, it has been shown that for certain classes of problems, the convergence of the underlying solution algorithm is strongly influenced by the shape or aspect ratio of the subdomains. Therefore, in this paper, the authors modify the multilevel algorithms to optimize a cost function based on the aspect ratio. Several variants of the algorithms are tested and shown to provide excellent results.
Resumo:
Realizing scalable performance on high performance computing systems is not straightforward for single-phenomenon codes (such as computational fluid dynamics [CFD]). This task is magnified considerably when the target software involves the interactions of a range of phenomena that have distinctive solution procedures involving different discretization methods. The problems of addressing the key issues of retaining data integrity and the ordering of the calculation procedures are significant. A strategy for parallelizing this multiphysics family of codes is described for software exploiting finite-volume discretization methods on unstructured meshes using iterative solution procedures. A mesh partitioning-based SPMD approach is used. However, since different variables use distinct discretization schemes, this means that distinct partitions are required; techniques for addressing this issue are described using the mesh-partitioning tool, JOSTLE. In this contribution, the strategy is tested for a variety of test cases under a wide range of conditions (e.g., problem size, number of processors, asynchronous / synchronous communications, etc.) using a variety of strategies for mapping the mesh partition onto the processor topology.
Resumo:
Parallel computing is now widely used in numerical simulation, particularly for application codes based on finite difference and finite element methods. A popular and successful technique employed to parallelize such codes onto large distributed memory systems is to partition the mesh into sub-domains that are then allocated to processors. The code then executes in parallel, using the SPMD methodology, with message passing for inter-processor interactions. In order to improve the parallel efficiency of an imbalanced structured mesh CFD code, a new dynamic load balancing (DLB) strategy has been developed in which the processor partition range limits of just one of the partitioned dimensions uses non-coincidental limits, as opposed to coincidental limits. The ‘local’ partition limit change allows greater flexibility in obtaining a balanced load distribution, as the workload increase, or decrease, on a processor is no longer restricted by the ‘global’ (coincidental) limit change. The automatic implementation of this generic DLB strategy within an existing parallel code is presented in this chapter, along with some preliminary results.
Resumo:
The most common parallelisation strategy for many Computational Mechanics (CM) (typified by Computational Fluid Dynamics (CFD) applications) which use structured meshes, involves a 1D partition based upon slabs of cells. However, many CFD codes employ pipeline operations in their solution procedure. For parallelised versions of such codes to scale well they must employ two (or more) dimensional partitions. This paper describes an algorithmic approach to the multi-dimensional mesh partitioning in code parallelisation, its implementation in a toolkit for almost automatically transforming scalar codes to parallel form, and its testing on a range of ‘real-world’ FORTRAN codes. The concept of multi-dimensional partitioning is straightforward, but non-trivial to represent as a sufficiently generic algorithm so that it can be embedded in a code transformation tool. The results of the tests on fine real-world codes demonstrate clear improvements in parallel performance and scalability (over a 1D partition). This is matched by a huge reduction in the time required to develop the parallel versions when hand coded – from weeks/months down to hours/days.
Resumo:
Multilevel algorithms are a successful class of optimisation techniques which address the mesh partitioning problem. They usually combine a graph contraction algorithm together with a local optimisation method which refines the partition at each graph level. To date these algorithms have been used almost exclusively to minimise the cut-edge weight, however it has been shown that for certain classes of solution algorithm, the convergence of the solver is strongly influenced by the subdomain aspect ratio. In this paper therefore, we modify the multilevel algorithms in order to optimise a cost function based on aspect ratio. Several variants of the algorithms are tested and shown to provide excellent results.
Resumo:
Multilevel algorithms are a successful class of optimization techniques which addresses the mesh partitioning problem. They usually combine a graph contraction algorithm together with a local optimization method which refines the partition at each graph level. In this paper we present an enhancement of the technique which uses imbalance to achieve higher quality partitions. We also present a formulation of the Kernighan-Lin partition optimization algorithm which incorporates load-balancing. The resulting algorithm is tested against a different but related state-of-the-art partitioner and shown to provide improved results.
Resumo:
We present a dynamic distributed load balancing algorithm for parallel, adaptive Finite Element simulations in which we use preconditioned Conjugate Gradient solvers based on domain-decomposition. The load balancing is designed to maintain good partition aspect ratio and we show that cut size is not always the appropriate measure in load balancing. Furthermore, we attempt to answer the question why the aspect ratio of partitions plays an important role for certain solvers. We define and rate different kinds of aspect ratio and present a new center-based partitioning method of calculating the initial distribution which implicitly optimizes this measure. During the adaptive simulation, the load balancer calculates a balancing flow using different versions of the diffusion algorithm and a variant of breadth first search. Elements to be migrated are chosen according to a cost function aiming at the optimization of subdomain shapes. Experimental results for Bramble's preconditioner and comparisons to state-of-the-art load balancers show the benefits of the construction.
Resumo:
Multilevel algorithms are a successful class of optimisation techniques which address the mesh partitioning problem for distributing unstructured meshes onto parallel computers. They usually combine a graph contraction algorithm together with a local optimisation method which refines the partition at each graph level. To date these algorithms have been used almost exclusively to minimise the cut edge weight in the graph with the aim of minimising the parallel communication overhead, but recently there has been a perceived need to take into account the communications network of the parallel machine. For example the increasing use of SMP clusters (systems of multiprocessor compute nodes with very fast intra-node communications but relatively slow inter-node networks) suggest the use of hierarchical network models. Indeed this requirement is exacerbated in the early experiments with meta-computers (multiple supercomputers combined together, in extreme cases over inter-continental networks). In this paper therefore, we modify a multilevel algorithm in order to minimise a cost function based on a model of the communications network. Several network models and variants of the algorithm are tested and we establish that it is possible to successfully guide the optimisation to reflect the chosen architecture.
Resumo:
Social network analysts have tried to capture the idea of social role explicitly by proposing a framework that precisely gives conditions under which group actors are playing equivalent roles. They term these methods positional analysis techniques. The most general definition is regular equivalence which captures the idea that equivalent actors are related in a similar way to equivalent alters. Regular equivalence gives rise to a whole class of partitions on a network. Given a network we have two different computational problems. The first is how to find a particular regular equivalence. An algorithm exists to find the largest regular partition but there are not efficient algorithms to test whether there is a regular k-partition. That is a partition in k groups that is regular. In addition, when dealing with real data, it is unlikely that any regular partitions exist. To overcome this problem relaxations of regular equivalence have been proposed along with optimisation techniques to find nearly regular partitions. In this paper we review the algorithms that have developed to find particular regular equivalences and look at some of the recent theoretical results which give an insight into the complexity of finding regular partitions.
Resumo:
The concept of a “true” ground-truth map is introduced, from which the inaccuracy/error of any production map may be measured. A partition of the mapped region is defined in terms of the “residual rectification” transformation. Geometric RMS-type and Geometric Distortion error criteria are defined as well as a map mis-classification error criterion (the latter for hard and fuzzy produc-tion maps). The total map error is defined to be the sum (over each set of the map partition men-tioned above) of these three error components integrated over each set of the partition.
Resumo:
In this paper, we study a problem of scheduling and batching on two machines in a flow-shop and open-shop environment. Each machine processes operations in batches, and the processing time of a batch is the sum of the processing times of the operations in that batch. A setup time, which depends only on the machine, is required before a batch is processed on a machine, and all jobs in a batch remain at the machine until the entire batch is processed. The aim is to make batching and sequencing decisions, which specify a partition of the jobs into batches on each machine, and a processing order of the batches on each machine, respectively, so that the makespan is minimized. The flow-shop problem is shown to be strongly NP-hard. We demonstrate that there is an optimal solution with the same batches on the two machines; we refer to these as consistent batches. A heuristic is developed that selects the best schedule among several with one, two, or three consistent batches, and is shown to have a worst-case performance ratio of 4/3. For the open-shop, we show that the problem is NP-hard in the ordinary sense. By proving the existence of an optimal solution with one, two or three consistent batches, a close relationship is established with the problem of scheduling two or three identical parallel machines to minimize the makespan. This allows a pseudo-polynomial algorithm to be derived, and various heuristic methods to be suggested.
Resumo:
The problem of deriving parallel mesh partitioning algorithms for mapping unstructured meshes to parallel computers is discussed in this chapter. In itself this raises a paradox - we seek to find a high quality partition of the mesh, but to compute it in parallel we require a partition of the mesh. In fact, we overcome this difficulty by deriving an optimisation strategy which can find a high quality partition even if the quality of the initial partition is very poor and then use a crude distribution scheme for the initial partition. The basis of this strategy is to use a multilevel approach combined with local refinement algorithms. Three such refinement algorithms are outlined and some example results presented which show that they can produce very high global quality partitions, very rapidly. The results are also compared with a similar multilevel serial partitioner and shown to be almost identical in quality. Finally we consider the impact of the initial partition on the results and demonstrate that the final partition quality is, modulo a certain amount of noise, independent of the initial partition.
Resumo:
In this Chapter we discuss the load-balancing issues arising in parallel mesh based computational mechanics codes for which the processor loading changes during the run. We briefly touch on geometric repartitioning ideas and then focus on different ways of using a graph both to solve the load-balancing problem and the optimisation problem, both locally and globally. We also briefly discuss whether repartitioning is always valid. Sample illustrative results are presented and we conclude that repartitioning is an attractive option if the load changes are not too dramatic and that there is a certain trade-off between partition quality and volume of data that the underlying application needs to migrate.