985 resultados para Parallel computation
Resumo:
Presented here, in a vector formulation, is an O(mn2) direct concise algorithm that prunes/identifies the linearly dependent (ld) rows of an arbitrary m X n matrix A and computes its reflexive type minimum norm inverse A(mr)-, which will be the true inverse A-1 if A is nonsingular and the Moore-Penrose inverse A+ if A is full row-rank. The algorithm, without any additional computation, produces the projection operator P = (I - A(mr)- A) that provides a means to compute any of the solutions of the consistent linear equation Ax = b since the general solution may be expressed as x = A(mr)+b + Pz, where z is an arbitrary vector. The rank r of A will also be produced in the process. Some of the salient features of this algorithm are that (i) the algorithm is concise, (ii) the minimum norm least squares solution for consistent/inconsistent equations is readily computable when A is full row-rank (else, a minimum norm solution for consistent equations is obtainable), (iii) the algorithm identifies ld rows, if any, and reduces concerned computation and improves accuracy of the result, (iv) error-bounds for the inverse as well as the solution x for Ax = b are readily computable, (v) error-free computation of the inverse, solution vector, rank, and projection operator and its inherent parallel implementation are straightforward, (vi) it is suitable for vector (pipeline) machines, and (vii) the inverse produced by the algorithm can be used to solve under-/overdetermined linear systems.
Resumo:
The distributed implementation of an algorithm for computing fixed points of an infinity-nonexpansive map is shown to converge to the set of fixed points under very general conditions.
Resumo:
A parallel matrix multiplication algorithm is presented, and studies of its performance and estimation are discussed. The algorithm is implemented on a network of transputers connected in a ring topology. An efficient scheme for partitioning the input matrices is introduced which enables overlapping computation with communication. This makes the algorithm achieve near-ideal speed-up for reasonably large matrices. Analytical expressions for the execution time of the algorithm have been derived by analysing its computation and communication characteristics. These expressions are validated by comparing the theoretical results of the performance with the experimental values obtained on a four-transputer network for both square and irregular matrices. The analytical model is also used to estimate the performance of the algorithm for a varying number of transputers and varying problem sizes. Although the algorithm is implemented on transputers, the methodology and the partitioning scheme presented in this paper are quite general and can be implemented on other processors which have the capability of overlapping computation with communication. The equations for performance prediction can also be extended to other multiprocessor systems.
Resumo:
This paper presents recursive algorithms for fast computation of Legendre and Zernike moments of a grey-level image intensity distribution. For a binary image, a contour integration method is developed for the evaluation of Legendre moments using only the boundary information. A method for recursive calculation of Zernike polynomial coefficients is also given. A square-to-circular image transformation scheme is introduced to minimize the computation involved in Zernike moment functions. The recursive formulae can also be used in inverse moment transforms to reconstruct the original image from moments. The mathematical framework of the algorithms is given in detail, and illustrated with binary and grey-level images.
Resumo:
Floquet analysis is widely used for small-order systems (say, order M < 100) to find trim results of control inputs and periodic responses, and stability results of damping levels and frequencies, Presently, however, it is practical neither for design applications nor for comprehensive analysis models that lead to large systems (M > 100); the run time on a sequential computer is simply prohibitive, Accordingly, a massively parallel Floquet analysis is developed with emphasis on large systems, and it is implemented on two SIMD or single-instruction, multiple-data computers with 4096 and 8192 processors, The focus of this development is a parallel shooting method with damped Newton iteration to generate trim results; the Floquet transition matrix (FTM) comes out as a byproduct, The eigenvalues and eigenvectors of the FTM are computed by a parallel QR method, and thereby stability results are generated, For illustration, flap and flap-lag stability of isolated rotors are treated by the parallel analysis and by a corresponding sequential analysis with the conventional shooting and QR methods; linear quasisteady airfoil aerodynamics and a finite-state three-dimensional wake model are used, Computational reliability is quantified by the condition numbers of the Jacobian matrices in Newton iteration, the condition numbers of the eigenvalues and the residual errors of the eigenpairs, and reliability figures are comparable in both the parallel and sequential analyses, Compared to the sequential analysis, the parallel analysis reduces the run time of large systems dramatically, and the reduction increases with increasing system order; this finding offers considerable promise for design and comprehensive-analysis applications.
Resumo:
in this short note, we determine precisely which operators have the property that their (full, symmetric or antisymmetric) second quantisation is an operator which is bounded or belongs to one of the various Schatten ideals; we also note that in 'the interior' of the natural domain, the second quantisation is a continuous map.
Resumo:
A symmetrizer of a nonsymmetric matrix A is the symmetric matrix X that satisfies the equation XA = A(t)X, where t indicates the transpose. A symmetrizer is useful in converting a nonsymmetric eigenvalue problem into a symmetric one which is relatively easy to solve and finds applications in stability problems in control theory and in the study of general matrices. Three designs based on VLSI parallel processor arrays are presented to compute a symmetrizer of a lower Hessenberg matrix. Their scope is discussed. The first one is the Leiserson systolic design while the remaining two, viz., the double pipe design and the fitted diagonal design are the derived versions of the first design with improved performance.
Resumo:
This paper discusses the parallel implementation of the solution of a set of linear equations using the Alternative Quadrant Interlocking Factorisation Methods (AQIF), on a star topology. Both the AQIF and LU decomposition methods are mapped onto star topology on an IBM SP2 system, with MPI as the internode communicator. Performance parameters such as speedup, efficiency have been obtained through experimental and theoretical means. The studies demonstrate (i) a mismatch of 15% between the theoretical and experimental results, (ii) scalability of the AQIF algorithm, and (iii) faster executing AQIF algorithm.
Resumo:
This paper proposes a simple current error space vector based hysteresis controller for two-level inverter fed Induction Motor (IM) drives. This proposed hysteresis controller retains all advantages of conventional current error space vector based hysteresis controllers like fast dynamic response, simple to implement, adjacent voltage vector switching etc. The additional advantage of this proposed hysteresis controller is that it gives a phase voltage frequency spectrum exactly similar to that of a constant switching frequency space vector pulse width modulated (SVPWM) inverter. In this proposed hysteresis controller the boundary is computed online using estimated stator voltages along alpha and beta axes thus completely eliminating look up tables used for obtaining parabolic hysteresis boundary proposed in. The estimation of stator voltage is carried out using current errors along alpha and beta axes and steady state model of induction motor. The proposed scheme is simple and capable of taking inverter upto six step mode operation, if demanded by drive system. The proposed hysteresis controller based inverter fed drive scheme is simulated extensively using SIMULINK toolbox of MATLAB for steady state and transient performance. The experimental verification for steady state performance of the proposed scheme is carried out on a 3.7kW IM.
Resumo:
Parallel execution of computational mechanics codes requires efficient mesh-partitioning techniques. These mesh-partitioning techniques divide the mesh into specified number of submeshes of approximately the same size and at the same time, minimise the interface nodes of the submeshes. This paper describes a new mesh partitioning technique, employing Genetic Algorithms. The proposed algorithm operates on the deduced graph (dual or nodal graph) of the given finite element mesh rather than directly on the mesh itself. The algorithm works by first constructing a coarse graph approximation using an automatic graph coarsening method. The coarse graph is partitioned and the results are interpolated onto the original graph to initialise an optimisation of the graph partition problem. In practice, hierarchy of (usually more than two) graphs are used to obtain the final graph partition. The proposed partitioning algorithm is applied to graphs derived from unstructured finite element meshes describing practical engineering problems and also several example graphs related to finite element meshes given in the literature. The test results indicate that the proposed GA based graph partitioning algorithm generates high quality partitions and are superior to spectral and multilevel graph partitioning algorithms.
Resumo:
Large-grain synchronous dataflow graphs or multi-rate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multi-rate large-grain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a method to minimize buffer storage requirement in constructing rate-optimal compile-time (MBRO) schedules for multi-rate dataflow graphs. We demonstrate that the constraints to minimize buffer storage while executing at the optimal computation rate (i.e. the maximum possible computation rate without storage constraints) can be formulated as a unified linear programming problem in our framework. A novel feature of our method is that in constructing the rate-optimal schedule, it directly minimizes the memory requirement by choosing the schedule time of nodes appropriately. Lastly, a new circular-arc interval graph coloring algorithm has been proposed to further reduce the memory requirement by allowing buffer sharing among the arcs of the multi-rate dataflow graph. We have constructed an experimental testbed which implements our MBRO scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules (also known as block schedules) proposed by Lee and Messerschmitt (IEEE Transactions on Computers, vol. 36, no. 1, 1987, pp. 24-35), (ii) the optimal scheduling buffer allocation (OSBA) algorithm of Ning and Gao (Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Charleston, SC, Jan. 10-13, 1993, pp. 29-42), and (iii) the multi-rate software pipelining (MRSP) algorithm (Govindarajan and Gao, in Proceedings of the 1993 International Conference on Application Specific Array Processors, Venice, Italy, Oct. 25-27, 1993, pp. 77-88). Schedules generated for a number of random dataflow graphs and for a set of DSP application programs using the different scheduling methods are compared. The experimental results have demonstrated a significant improvement (10-20%) in buffer requirements for the MBRO schedules compared to the schedules generated by the other three methods, without sacrificing the computation rate. The MBRO method also gives a 20% average improvement in computation rate compared to Lee's Block scheduling method.
Resumo:
In this paper, we present a differential-geometric approach to analyze the singularities of task space point trajectories of two and three-degree-of-freedom serial and parallel manipulators. At non-singular configurations, the first-order, local properties are characterized by metric coefficients, and, geometrically, by the shape and size of a velocity ellipse or an ellipsoid. At singular configurations, the determinant of the matrix of metric coefficients is zero and the velocity ellipsoid degenerates to an ellipse, a line or a point, and the area or the volume of the velocity ellipse or ellipsoid becomes zero. The degeneracies of the velocity ellipsoid or ellipse gives a simple geometric picture of the possible task space velocities at a singular configuration. To study the second-order properties at a singularity, we use the derivatives of the metric coefficients and the rate of change of area or volume. The derivatives are shown to be related to the possible task space accelerations at a singular configuration. In the case of parallel manipulators, singularities may lead to either loss or gain of one or more degrees-of-freedom. For loss of one or more degrees-of-freedom, ther possible velocities and accelerations are again obtained from a modified metric and derivatives of the metric coefficients. In the case of a gain of one or more degrees-of-freedom, the possible task space velocities can be pictured as growth to lines, ellipses, and ellipsoids. The theoretical results are illustrated with the help of a general spatial 2R manipulator and a three-degree-of-freedom RPSSPR-SPR parallel manipulator.