204 resultados para scalable parallel programming
Resumo:
In this paper we develop a Linear Programming (LP) based decentralized algorithm for a group of multiple autonomous agents to achieve positional consensus. Each agent is capable of exchanging information about its position and orientation with other agents within their sensing region. The method is computationally feasible and easy to implement. Analytical results are presented. The effectiveness of the approach is illustrated with simulation results.
Resumo:
We describe a compiler for the Flat Concurrent Prolog language on a message passing multiprocessor architecture. This compiler permits symbolic and declarative programming in the syntax of Guarded Horn Rules, The implementation has been verified and tested on the 64-node PARAM parallel computer developed by C-DAC (Centre for the Development of Advanced Computing, India), Flat Concurrent Prolog (FCP) is a logic programming language designed for concurrent programming and parallel execution, It is a process oriented language, which embodies dataflow synchronization and guarded-command as its basic control mechanisms. An identical algorithm is executed on every processor in the network, We assume regular network topologies like mesh, ring, etc, Each node has a local memory, The algorithm comprises of two important parts: reduction and communication, The most difficult task is to integrate the solutions of problems that arise in the implementation in a coherent and efficient manner. We have tested the efficacy of the compiler on various benchmark problems of the ICOT project that have been reported in the recent book by Evan Tick, These problems include Quicksort, 8-queens, and Prime Number Generation, The results of the preliminary tests are favourable, We are currently examining issues like indexing and load balancing to further optimize our compiler.
Resumo:
Floquet analysis is widely used for small-order systems (say, order M < 100) to find trim results of control inputs and periodic responses, and stability results of damping levels and frequencies, Presently, however, it is practical neither for design applications nor for comprehensive analysis models that lead to large systems (M > 100); the run time on a sequential computer is simply prohibitive, Accordingly, a massively parallel Floquet analysis is developed with emphasis on large systems, and it is implemented on two SIMD or single-instruction, multiple-data computers with 4096 and 8192 processors, The focus of this development is a parallel shooting method with damped Newton iteration to generate trim results; the Floquet transition matrix (FTM) comes out as a byproduct, The eigenvalues and eigenvectors of the FTM are computed by a parallel QR method, and thereby stability results are generated, For illustration, flap and flap-lag stability of isolated rotors are treated by the parallel analysis and by a corresponding sequential analysis with the conventional shooting and QR methods; linear quasisteady airfoil aerodynamics and a finite-state three-dimensional wake model are used, Computational reliability is quantified by the condition numbers of the Jacobian matrices in Newton iteration, the condition numbers of the eigenvalues and the residual errors of the eigenpairs, and reliability figures are comparable in both the parallel and sequential analyses, Compared to the sequential analysis, the parallel analysis reduces the run time of large systems dramatically, and the reduction increases with increasing system order; this finding offers considerable promise for design and comprehensive-analysis applications.
Resumo:
in this short note, we determine precisely which operators have the property that their (full, symmetric or antisymmetric) second quantisation is an operator which is bounded or belongs to one of the various Schatten ideals; we also note that in 'the interior' of the natural domain, the second quantisation is a continuous map.
Resumo:
A symmetrizer of a nonsymmetric matrix A is the symmetric matrix X that satisfies the equation XA = A(t)X, where t indicates the transpose. A symmetrizer is useful in converting a nonsymmetric eigenvalue problem into a symmetric one which is relatively easy to solve and finds applications in stability problems in control theory and in the study of general matrices. Three designs based on VLSI parallel processor arrays are presented to compute a symmetrizer of a lower Hessenberg matrix. Their scope is discussed. The first one is the Leiserson systolic design while the remaining two, viz., the double pipe design and the fitted diagonal design are the derived versions of the first design with improved performance.
Resumo:
This paper deals with the development of a new model for the cooling process on the runout table of hot strip mills, The suitability of different numerical methods for the solution of the proposed model equation from the point of view of accuracy and computation time are studied, Parallel solutions for the model equation are proposed.
Resumo:
A linear programming problem in an inequality form having a bounded solution is solved error-free using an algorithm that sorts the inequalities, removes the redundant ones, and uses the p-adic arithmetic. (C) Elsevier Science Inc., 1997
Resumo:
This paper discusses the parallel implementation of the solution of a set of linear equations using the Alternative Quadrant Interlocking Factorisation Methods (AQIF), on a star topology. Both the AQIF and LU decomposition methods are mapped onto star topology on an IBM SP2 system, with MPI as the internode communicator. Performance parameters such as speedup, efficiency have been obtained through experimental and theoretical means. The studies demonstrate (i) a mismatch of 15% between the theoretical and experimental results, (ii) scalability of the AQIF algorithm, and (iii) faster executing AQIF algorithm.
Resumo:
Parallel execution of computational mechanics codes requires efficient mesh-partitioning techniques. These mesh-partitioning techniques divide the mesh into specified number of submeshes of approximately the same size and at the same time, minimise the interface nodes of the submeshes. This paper describes a new mesh partitioning technique, employing Genetic Algorithms. The proposed algorithm operates on the deduced graph (dual or nodal graph) of the given finite element mesh rather than directly on the mesh itself. The algorithm works by first constructing a coarse graph approximation using an automatic graph coarsening method. The coarse graph is partitioned and the results are interpolated onto the original graph to initialise an optimisation of the graph partition problem. In practice, hierarchy of (usually more than two) graphs are used to obtain the final graph partition. The proposed partitioning algorithm is applied to graphs derived from unstructured finite element meshes describing practical engineering problems and also several example graphs related to finite element meshes given in the literature. The test results indicate that the proposed GA based graph partitioning algorithm generates high quality partitions and are superior to spectral and multilevel graph partitioning algorithms.
Resumo:
The decision-making process for machine-tool selection and operation allocation in a flexible manufacturing system (FMS) usually involves multiple conflicting objectives. Thus, a fuzzy goal-programming model can be effectively applied to this decision problem. The paper addresses application of a fuzzy goal-programming concept to model the problem of machine-tool selection and operation allocation with explicit considerations given to objectives of minimizing the total cost of machining operation, material handling and set-up. The constraints pertaining to the capacity of machines, tool magazine and tool life are included in the model. A genetic algorithm (GA)-based approach is adopted to optimize this fuzzy goal-programming model. An illustrative example is provided and some results of computational experiments are reported.
Resumo:
Large-grain synchronous dataflow graphs or multi-rate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multi-rate large-grain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a method to minimize buffer storage requirement in constructing rate-optimal compile-time (MBRO) schedules for multi-rate dataflow graphs. We demonstrate that the constraints to minimize buffer storage while executing at the optimal computation rate (i.e. the maximum possible computation rate without storage constraints) can be formulated as a unified linear programming problem in our framework. A novel feature of our method is that in constructing the rate-optimal schedule, it directly minimizes the memory requirement by choosing the schedule time of nodes appropriately. Lastly, a new circular-arc interval graph coloring algorithm has been proposed to further reduce the memory requirement by allowing buffer sharing among the arcs of the multi-rate dataflow graph. We have constructed an experimental testbed which implements our MBRO scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules (also known as block schedules) proposed by Lee and Messerschmitt (IEEE Transactions on Computers, vol. 36, no. 1, 1987, pp. 24-35), (ii) the optimal scheduling buffer allocation (OSBA) algorithm of Ning and Gao (Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Charleston, SC, Jan. 10-13, 1993, pp. 29-42), and (iii) the multi-rate software pipelining (MRSP) algorithm (Govindarajan and Gao, in Proceedings of the 1993 International Conference on Application Specific Array Processors, Venice, Italy, Oct. 25-27, 1993, pp. 77-88). Schedules generated for a number of random dataflow graphs and for a set of DSP application programs using the different scheduling methods are compared. The experimental results have demonstrated a significant improvement (10-20%) in buffer requirements for the MBRO schedules compared to the schedules generated by the other three methods, without sacrificing the computation rate. The MBRO method also gives a 20% average improvement in computation rate compared to Lee's Block scheduling method.
Resumo:
In this paper, we present a differential-geometric approach to analyze the singularities of task space point trajectories of two and three-degree-of-freedom serial and parallel manipulators. At non-singular configurations, the first-order, local properties are characterized by metric coefficients, and, geometrically, by the shape and size of a velocity ellipse or an ellipsoid. At singular configurations, the determinant of the matrix of metric coefficients is zero and the velocity ellipsoid degenerates to an ellipse, a line or a point, and the area or the volume of the velocity ellipse or ellipsoid becomes zero. The degeneracies of the velocity ellipsoid or ellipse gives a simple geometric picture of the possible task space velocities at a singular configuration. To study the second-order properties at a singularity, we use the derivatives of the metric coefficients and the rate of change of area or volume. The derivatives are shown to be related to the possible task space accelerations at a singular configuration. In the case of parallel manipulators, singularities may lead to either loss or gain of one or more degrees-of-freedom. For loss of one or more degrees-of-freedom, ther possible velocities and accelerations are again obtained from a modified metric and derivatives of the metric coefficients. In the case of a gain of one or more degrees-of-freedom, the possible task space velocities can be pictured as growth to lines, ellipses, and ellipsoids. The theoretical results are illustrated with the help of a general spatial 2R manipulator and a three-degree-of-freedom RPSSPR-SPR parallel manipulator.