905 resultados para Parallel algorithm
Resumo:
A parallel method for the dynamic partitioning of unstructured meshes is outlined. The method includes diffusive load-balancing techniques and an iterative optimisation technique known as relative gain optimisationwhich both balances theworkload and attempts to minimise the interprocessor communications overhead. It can also optionally include amultilevel strategy. Experiments on a series of adaptively refined meshes indicate that the algorithmprovides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more rapidly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.
Resumo:
This chapter describes a parallel optimization technique that incorporates a distributed load-balancing algorithm and provides an extremely fast solution to the problem of load-balancing adaptive unstructured meshes. Moreover, a parallel graph contraction technique can be employed to enhance the partition quality and the resulting strategy outperforms or matches results from existing state-of-the-art static mesh partitioning algorithms. The strategy can also be applied to static partitioning problems. Dynamic procedures have been found to be much faster than static techniques, to provide partitions of similar or higher quality and, in comparison, involve the migration of a fraction of the data. The method employs a new iterative optimization technique that balances the workload and attempts to minimize the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more quickly. The dynamic evolution of load has three major influences on possible partitioning techniques; cost, reuse, and parallelism. The unstructured mesh may be modified every few time-steps and so the load-balancing must have a low cost relative to that of the solution algorithm in between remeshing.
Resumo:
A parallel method for dynamic partitioning of unstructured meshes is described. The method employs a new iterative optimisation technique which both balances the workload and attempts to minimise the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more quickly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.
Resumo:
A method is outlined for optimising graph partitions which arise in mapping un- structured mesh calculations to parallel computers. The method employs a combination of iterative techniques to both evenly balance the workload and minimise the number and volume of interprocessor communications. They are designed to work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. The algorithms can also be used for dynamic load-balancing and a clustering technique can additionally be employed to speed up the whole process. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds.
Resumo:
We present a dynamic distributed load balancing algorithm for parallel, adaptive finite element simulations using preconditioned conjugate gradient solvers based on domain-decomposition. The load balancer is designed to maintain good partition aspect ratios. It can calculate a balancing flow using different versions of diffusion and a variant of breadth first search. Elements to be migrated are chosen according to a cost function aiming at the optimization of subdomain shapes. We show how to use information from the second step to guide the first. Experimental results using Bramble's preconditioner and comparisons to existing state-ot-the-art load balancers show the benefits of the construction.
Resumo:
We present a dynamic distributed load balancing algorithm for parallel, adaptive finite element simulations using preconditioned conjugate gradient solvers based on domain-decomposition. The load balancer is designed to maintain good partition aspect ratios. It calculates a balancing flow using different versions of diffusion and a variant of breadth first search. Elements to be migrated are chosen according to a cost function aiming at the optimization of subdomain shapes. We show how to use information from the second step to guide the first. Experimental results using Bramble's preconditioner and comparisons to existing state-of-the-art balancers show the benefits of the construction.
Resumo:
A method is outlined for optimising graph partitions which arise in mapping unstructured mesh calculations to parallel computers. The method employs a relative gain iterative technique to both evenly balance the workload and minimise the number and volume of interprocessor communications. A parallel graph reduction technique is also briefly described and can be used to give a global perspective to the optimisation. The algorithms work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds. The algorithms can also be used for dynamic load-balancing, reusing existing partitions and in this case the procedures are much faster than static techniques, provide partitions of similar or higher quality and, in comparison, involve the migration of a fraction of the data.
Resumo:
A parallel method for the dynamic partitioning of unstructured meshes is described. The method introduces a new iterative optimisation technique known as relative gain optimisation which both balances the workload and attempts to minimise the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more rapidly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.
Resumo:
Abstract: This paper reports a lot-sizing and scheduling problem, which minimizes inventory and backlog costs on m parallel machines with sequence-dependent set-up times over t periods. Problem solutions are represented as product subsets ordered and/or unordered for each machine m at each period t. The optimal lot sizes are determined applying a linear program. A genetic algorithm searches either over ordered or over unordered subsets (which are implicitly ordered using a fast ATSP-type heuristic) to identify an overall optimal solution. Initial computational results are presented, comparing the speed and solution quality of the ordered and unordered genetic algorithm approaches.
Resumo:
A poster of this paper will be presented at the 25th International Conference on Parallel Architecture and Compilation Technology (PACT ’16), September 11-15, 2016, Haifa, Israel.
Resumo:
As one of the newest members in Articial Immune Systems (AIS), the Dendritic Cell Algorithm (DCA) has been applied to a range of problems. These applications mainly belong to the eld of anomaly detection. However, real-time detection, a new challenge to anomaly detection, requires improvement on the real-time capability of the DCA. To assess such capability, formal methods in the research of real-time systems can be employed. The ndings of the assessment can provide guideline for the future development of the algorithm. Therefore, in this paper we use an interval logic based method, named the Duration Calcu- lus (DC), to specify a simplied single-cell model of the DCA. Based on the DC specications with further induction, we nd that each individual cell in the DCA can perform its function as a detector in real-time. Since the DCA can be seen as many such cells operating in parallel, it is potentially capable of performing real-time detection. However, the analysis process of the standard DCA constricts its real-time capability. As a result, we conclude that the analysis process of the standard DCA should be replaced by a real-time analysis component, which can perform periodic analysis for the purpose of real-time detection.
Resumo:
Underactuated cable-driven parallel robots (UACDPRs) shift a 6-degree-of-freedom end-effector (EE) with fewer than 6 cables. This thesis proposes a new automatic calibration technique that is applicable for under-actuated cable-driven parallel robots. The purpose of this work is to develop a method that uses free motion as an exciting trajectory for the acquisition of calibration data. The key point of this approach is to find a relationship between the unknown parameters to be calibrated (the lengths of the cables) and the parameters that could be measured by sensors (the swivel pulley angles measured by the encoders and roll-and-pitch angles measured by inclinometers on the platform). The equations involved are the geometrical-closure equations and the finite-difference velocity equations, solved using the least-squares algorithm. Simulations are performed on a parallel robot driven by 4 cables for validation. The final purpose of the calibration method is, still, the determination of the platform initial pose. As a consequence of underactuation, the EE is underconstrained and, for assigned cable lengths, the EE pose cannot be obtained by means of forward kinematics only. Hence, a direct-kinematics algorithm for a 4-cable UACDPR using redundant sensor measurements is proposed. The proposed method measures two orientation parameters of the EE besides cable lengths, in order to determine the other four pose variables, namely 3 position coordinates and one additional orientation parameter. Then, we study the performance of the direct-kinematics algorithm through the computation of the sensitivity of the direct-kinematics solution to measurement errors. Furthermore, position and orientation error upper limits are computed for bounded cable lengths errors resulting from the calibration procedure, and roll and pitch angles errors which are due to inclinometer inaccuracies.
Resumo:
Modern High-Performance Computing HPC systems are gradually increasing in size and complexity due to the correspondent demand of larger simulations requiring more complicated tasks and higher accuracy. However, as side effects of the Dennard’s scaling approaching its ultimate power limit, the efficiency of software plays also an important role in increasing the overall performance of a computation. Tools to measure application performance in these increasingly complex environments provide insights into the intricate ways in which software and hardware interact. The monitoring of the power consumption in order to save energy is possible through processors interfaces like Intel Running Average Power Limit RAPL. Given the low level of these interfaces, they are often paired with an application-level tool like Performance Application Programming Interface PAPI. Since several problems in many heterogeneous fields can be represented as a complex linear system, an optimized and scalable linear system solver algorithm can decrease significantly the time spent to compute its resolution. One of the most widely used algorithms deployed for the resolution of large simulation is the Gaussian Elimination, which has its most popular implementation for HPC systems in the Scalable Linear Algebra PACKage ScaLAPACK library. However, another relevant algorithm, which is increasing in popularity in the academic field, is the Inhibition Method. This thesis compares the energy consumption of the Inhibition Method and Gaussian Elimination from ScaLAPACK to profile their execution during the resolution of linear systems above the HPC architecture offered by CINECA. Moreover, it also collates the energy and power values for different ranks, nodes, and sockets configurations. The monitoring tools employed to track the energy consumption of these algorithms are PAPI and RAPL, that will be integrated with the parallel execution of the algorithms managed with the Message Passing Interface MPI.
Resumo:
Lipidic mixtures present a particular phase change profile highly affected by their unique crystalline structure. However, classical solid-liquid equilibrium (SLE) thermodynamic modeling approaches, which assume the solid phase to be a pure component, sometimes fail in the correct description of the phase behavior. In addition, their inability increases with the complexity of the system. To overcome some of these problems, this study describes a new procedure to depict the SLE of fatty binary mixtures presenting solid solutions, namely the Crystal-T algorithm. Considering the non-ideality of both liquid and solid phases, this algorithm is aimed at the determination of the temperature in which the first and last crystal of the mixture melts. The evaluation is focused on experimental data measured and reported in this work for systems composed of triacylglycerols and fatty alcohols. The liquidus and solidus lines of the SLE phase diagrams were described by using excess Gibbs energy based equations, and the group contribution UNIFAC model for the calculation of the activity coefficients of both liquid and solid phases. Very low deviations of theoretical and experimental data evidenced the strength of the algorithm, contributing to the enlargement of the scope of the SLE modeling.