877 resultados para Load Balancing
Resumo:
A parallel method for the dynamic partitioning of unstructured meshes is outlined. The method includes diffusive load-balancing techniques and an iterative optimisation technique known as relative gain optimisationwhich both balances theworkload and attempts to minimise the interprocessor communications overhead. It can also optionally include amultilevel strategy. Experiments on a series of adaptively refined meshes indicate that the algorithmprovides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more rapidly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.
Resumo:
This chapter describes a parallel optimization technique that incorporates a distributed load-balancing algorithm and provides an extremely fast solution to the problem of load-balancing adaptive unstructured meshes. Moreover, a parallel graph contraction technique can be employed to enhance the partition quality and the resulting strategy outperforms or matches results from existing state-of-the-art static mesh partitioning algorithms. The strategy can also be applied to static partitioning problems. Dynamic procedures have been found to be much faster than static techniques, to provide partitions of similar or higher quality and, in comparison, involve the migration of a fraction of the data. The method employs a new iterative optimization technique that balances the workload and attempts to minimize the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more quickly. The dynamic evolution of load has three major influences on possible partitioning techniques; cost, reuse, and parallelism. The unstructured mesh may be modified every few time-steps and so the load-balancing must have a low cost relative to that of the solution algorithm in between remeshing.
Resumo:
A parallel method for dynamic partitioning of unstructured meshes is described. The method employs a new iterative optimisation technique which both balances the workload and attempts to minimise the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more quickly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.
Resumo:
In parallel adaptive finite element simulations the work load on the individual processors may change frequently. To (re)distribute the load evenly over the processors a load balancing heuristic is needed. Common strategies try to minimise subdomain dependencies by optimising the cutsize of the partitioning. However for certain solvers cutsize only plays a minor role, and their convergence is highly dependent on the subdomain shapes. Degenerated subdomain shapes cause them to need significantly more iterations to converge. In this work a new parallel load balancing strategy is introduced which directly addresses the problem of generating and conserving reasonably good subdomain shapes in a dynamically changing Finite Element Simulation. Geometric data is used to formulate several cost functions to rate elements in terms of their suitability to be migrated. The well known diffusive method which calculates the necessary load flow is enhanced by weighting the subdomain edges with the help of these cost functions. The proposed methods have been tested and results are presented.
Resumo:
We present a dynamic distributed load balancing algorithm for parallel, adaptive finite element simulations using preconditioned conjugate gradient solvers based on domain-decomposition. The load balancer is designed to maintain good partition aspect ratios. It calculates a balancing flow using different versions of diffusion and a variant of breadth first search. Elements to be migrated are chosen according to a cost function aiming at the optimization of subdomain shapes. We show how to use information from the second step to guide the first. Experimental results using Bramble's preconditioner and comparisons to existing state-of-the-art balancers show the benefits of the construction.
Resumo:
As the complexity of parallel applications increase, the performance limitations resulting from computational load imbalance become dominant. Mapping the problem space to the processors in a parallel machine in a manner that balances the workload of each processors will typically reduce the run-time. In many cases the computation time required for a given calculation cannot be predetermined even at run-time and so static partition of the problem returns poor performance. For problems in which the computational load across the discretisation is dynamic and inhomogeneous, for example multi-physics problems involving fluid and solid mechanics with phase changes, the workload for a static subdomain will change over the course of a computation and cannot be estimated beforehand. For such applications the mapping of loads to process is required to change dynamically, at run-time in order to maintain reasonable efficiency. The issue of dynamic load balancing are examined in the context of PHYSICA, a three dimensional unstructured mesh multi-physics continuum mechanics computational modelling code.
Resumo:
A method is outlined for optimising graph partitions which arise in mapping unstructured mesh calculations to parallel computers. The method employs a relative gain iterative technique to both evenly balance the workload and minimise the number and volume of interprocessor communications. A parallel graph reduction technique is also briefly described and can be used to give a global perspective to the optimisation. The algorithms work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds. The algorithms can also be used for dynamic load-balancing, reusing existing partitions and in this case the procedures are much faster than static techniques, provide partitions of similar or higher quality and, in comparison, involve the migration of a fraction of the data.
Resumo:
This paper presents a new dynamic load balancing technique for structured mesh computational mechanics codes in which the processor partition range limits of just one of the partitioned dimensions uses non-coincidental limits, as opposed to using coincidental limits in all of the partitioned dimensions. The partition range limits are 'staggered', allowing greater flexibility in obtaining a balanced load distribution in comparison to when the limits are changed 'globally'. as the load increase/decrease on one processor no longer restricts the load decrease/increase on a neighbouring processor. The automatic implementation of this 'staggered' load balancing strategy within an existing parallel code is presented in this paper, along with some preliminary results.
Resumo:
Solving a complex Constraint Satisfaction Problem (CSP) is a computationally hard task which may require a considerable amount of time. Parallelism has been applied successfully to the job and there are already many applications capable of harnessing the parallel power of modern CPUs to speed up the solving process. Current Graphics Processing Units (GPUs), containing from a few hundred to a few thousand cores, possess a level of parallelism that surpasses that of CPUs and there are much less applications capable of solving CSPs on GPUs, leaving space for further improvement. This paper describes work in progress in the solving of CSPs on GPUs, CPUs and other devices, such as Intel Many Integrated Cores (MICs), in parallel. It presents the gains obtained when applying more devices to solve some problems and the main challenges that must be faced when using devices with as different architectures as CPUs and GPUs, with a greater focus on how to effectively achieve good load balancing between such heterogeneous devices.
Resumo:
In this paper, we present a decentralized dynamic load scheduling/balancing algorithm called ELISA (Estimated Load Information Scheduling Algorithm) for general purpose distributed computing systems. ELISA uses estimated state information based upon periodic exchange of exact state information between neighbouring nodes to perform load scheduling. The primary objective of the algorithm is to cut down on the communication and load transfer overheads by minimizing the frequency of status exchange and by restricting the load transfer and status exchange within the buddy set of a processor. It is shown that the resulting algorithm performs almost as well as a perfect information algorithm and is superior to other load balancing schemes based on the random sharing and Ni-Hwang algorithms. A sensitivity analysis to study the effect of various design parameters on the effectiveness of load balancing is also carried out. Finally, the algorithm's performance is tested on large dimensional hypercubes in the presence of time-varying load arrival process and is shown to perform well in comparison to other algorithms. This makes ELISA a viable and implementable load balancing algorithm for use in general purpose distributed computing systems.
Resumo:
Load balancing is often used to ensure that nodes in a distributed systems are equally loaded. In this paper, we show that for real-time systems, load balancing is not desirable. In particular, we propose a new load-profiling strategy that allows the nodes of a distributed system to be unequally loaded. Using load profiling, the system attempts to distribute the load amongst its nodes so as to maximize the chances of finding a node that would satisfy the computational needs of incoming real-time tasks. To that end, we describe and evaluate a distributed load-profiling protocol for dynamically scheduling time-constrained tasks in a loosely-coupled distributed environment. When a task is submitted to a node, the scheduling software tries to schedule the task locally so as to meet its deadline. If that is not feasible, it tries to locate another node where this could be done with a high probability of success, while attempting to maintain an overall load profile for the system. Nodes in the system inform each other about their state using a combination of multicasting and gossiping. The performance of the proposed protocol is evaluated via simulation, and is contrasted to other dynamic scheduling protocols for real-time distributed systems. Based on our findings, we argue that keeping a diverse availability profile and using passive bidding (through gossiping) are both advantageous to distributed scheduling for real-time systems.
Resumo:
High-speed networks, such as ATM networks, are expected to support diverse Quality of Service (QoS) constraints, including real-time QoS guarantees. Real-time QoS is required by many applications such as those that involve voice and video communication. To support such services, routing algorithms that allow applications to reserve the needed bandwidth over a Virtual Circuit (VC) have been proposed. Commonly, these bandwidth-reservation algorithms assign VCs to routes using the least-loaded concept, and thus result in balancing the load over the set of all candidate routes. In this paper, we show that for such reservation-based protocols|which allow for the exclusive use of a preset fraction of a resource's bandwidth for an extended period of time-load balancing is not desirable as it results in resource fragmentation, which adversely affects the likelihood of accepting new reservations. In particular, we show that load-balancing VC routing algorithms are not appropriate when the main objective of the routing protocol is to increase the probability of finding routes that satisfy incoming VC requests, as opposed to equalizing the bandwidth utilization along the various routes. We present an on-line VC routing scheme that is based on the concept of "load profiling", which allows a distribution of "available" bandwidth across a set of candidate routes to match the characteristics of incoming VC QoS requests. We show the effectiveness of our load-profiling approach when compared to traditional load-balancing and load-packing VC routing schemes.
Resumo:
To support the diverse Quality of Service (QoS) requirements of real-time (e.g. audio/video) applications in integrated services networks, several routing algorithms that allow for the reservation of the needed bandwidth over a Virtual Circuit (VC) established on one of several candidate routes have been proposed. Traditionally, such routing is done using the least-loaded concept, and thus results in balancing the load across the set of candidate routes. In a recent study, we have established the inadequacy of this load balancing practice and proposed the use of load profiling as an alternative. Load profiling techniques allow the distribution of "available" bandwidth across a set of candidate routes to match the characteristics of incoming VC QoS requests. In this paper we thoroughly characterize the performance of VC routing using load profiling and contrast it to routing using load balancing and load packing. We do so both analytically and via extensive simulations of multi-class traffic routing in Virtual Path (VP) based networks. Our findings confirm that for routing guaranteed bandwidth flows in VP networks, load balancing is not desirable as it results in VP bandwidth fragmentation, which adversely affects the likelihood of accepting new VC requests. This fragmentation is more pronounced when the granularity of VC requests is large. Typically, this occurs when a common VC is established to carry the aggregate traffic flow of many high-bandwidth real-time sources. For VP-based networks, our simulation results show that our load-profiling VC routing scheme performs better or as well as the traditional load-balancing VC routing in terms of revenue under both skewed and uniform workloads. Furthermore, load-profiling routing improves routing fairness by proactively increasing the chances of admitting high-bandwidth connections.
Resumo:
Multilevel algorithms are a successful class of optimization techniques which addresses the mesh partitioning problem. They usually combine a graph contraction algorithm together with a local optimization method which refines the partition at each graph level. In this paper we present an enhancement of the technique which uses imbalance to achieve higher quality partitions. We also present a formulation of the Kernighan-Lin partition optimization algorithm which incorporates load-balancing. The resulting algorithm is tested against a different but related state-of-the-art partitioner and shown to provide improved results.
Resumo:
Multilevel algorithms are a successful class of optimisation techniques which address the mesh partitioning problem. They usually combine a graph contraction algorithm together with a local optimisation method which refines the partition at each graph level. In this paper we present an enhancement of the technique which uses imbalance to achieve higher quality partitions. We also present a formulation of the Kernighan-Lin partition optimisation algorithm which incorporates load-balancing. The resulting algorithm is tested against a different but related state-of the-art partitioner and shown to provide improved results.