107 resultados para NPB (NAS parallel benchmarks)
Resumo:
In this paper we develop a multithreaded VLSI processor linear array architecture to render complex environments based on the radiosity approach. The processing elements are identical and multithreaded. They work in Single Program Multiple Data (SPMD) mode. A new algorithm to do the radiosity computations based on the progressive refinement approach[2] is proposed. Simulation results indicate that the architecture is latency tolerant and scalable. It is shown that a linear array of 128 uni-threaded processing elements sustains a throughput close to 0.4 million patches/sec.
Resumo:
In this paper, a wireless control strategy for parallel operation of three-phase four-wire inverters is proposed. A generalized situation is considered where the inverters are of unequal power ratings and the loads are nonlinear and unbalanced in nature. The proposed control algorithm exploits the potential of sinusoidal domain proportional+multiresonant controller ( in the inner voltage regulation loop) to make the system suitable for nonlinear and unbalanced loads with a simple and generalized structure of virtual output-impedance loop. The decentralized operation is achieved by using three-phase P/Q droop characteristics. The overall control algorithm helps to limit the harmonic contents and the degree of unbalance in the output-voltage waveform and to achieve excellent power-sharing accuracy in spite of mismatch in the inverter output impedances. Moreover, a synchronized turn on with consequent change over to the droop mode is applied for the new incoming unit in order to limit the circulating current completely. The simulation and experimental results from-1 kVA and -0.5 kVA paralleled units validate the effectiveness of the scheme.
Resumo:
In this paper, we introduce an analytical technique based on queueing networks and Petri nets for making a performance analysis of dataflow computations when executed on the Manchester machine. This technique is also applicable for the analysis of parallel computations on multiprocessors. We characterize the parallelism in dataflow computations through a four-parameter characterization, namely, the minimum parallelism, the maximum parallelism, the average parallelism and the variance in parallelism. We observe through detailed investigation of our analytical models that the average parallelism is a good characterization of the dataflow computations only as long as the variance in parallelism is small. However, significant difference in performance measures will result when the variance in parallelism is comparable to or higher than the average parallelism.
Resumo:
Adsorption of n-alkane mixtures in the zeolite LTA-5A under liquid-phase conditions has been studied using grand canonical Monte Carlo (GCMC) simulations combined with parallel tempering. Normal GCMC techniques fail for some of these systems due to the preference of linear molecules to coil within a single cage in the zeolite. The narrow zeolite windows severerly restrict interactions of the molecules, making it difficult to simulate cooperative rearrangements necessary to explore configuration space. Because of these reasons, normal GCMC simulations results show poor reproducibility in some cases. These problems were overcome with parallel tempering techniques. Even with parallel tempering, these are very challenging systems for molecular simulation. Similar problems may arise for other zeolites such as CHA, AFX, ERI, KFI, and RHO having cages connected by narrow windows. The simulations capture the complex selectivity behavior observed in experiments such as selectivity inversion and azeotrope formation.
Resumo:
Floquet analysis is widely used for small-order systems (say, order M < 100) to find trim results of control inputs and periodic responses, and stability results of damping levels and frequencies, Presently, however, it is practical neither for design applications nor for comprehensive analysis models that lead to large systems (M > 100); the run time on a sequential computer is simply prohibitive, Accordingly, a massively parallel Floquet analysis is developed with emphasis on large systems, and it is implemented on two SIMD or single-instruction, multiple-data computers with 4096 and 8192 processors, The focus of this development is a parallel shooting method with damped Newton iteration to generate trim results; the Floquet transition matrix (FTM) comes out as a byproduct, The eigenvalues and eigenvectors of the FTM are computed by a parallel QR method, and thereby stability results are generated, For illustration, flap and flap-lag stability of isolated rotors are treated by the parallel analysis and by a corresponding sequential analysis with the conventional shooting and QR methods; linear quasisteady airfoil aerodynamics and a finite-state three-dimensional wake model are used, Computational reliability is quantified by the condition numbers of the Jacobian matrices in Newton iteration, the condition numbers of the eigenvalues and the residual errors of the eigenpairs, and reliability figures are comparable in both the parallel and sequential analyses, Compared to the sequential analysis, the parallel analysis reduces the run time of large systems dramatically, and the reduction increases with increasing system order; this finding offers considerable promise for design and comprehensive-analysis applications.
Resumo:
in this short note, we determine precisely which operators have the property that their (full, symmetric or antisymmetric) second quantisation is an operator which is bounded or belongs to one of the various Schatten ideals; we also note that in 'the interior' of the natural domain, the second quantisation is a continuous map.
Resumo:
A symmetrizer of a nonsymmetric matrix A is the symmetric matrix X that satisfies the equation XA = A(t)X, where t indicates the transpose. A symmetrizer is useful in converting a nonsymmetric eigenvalue problem into a symmetric one which is relatively easy to solve and finds applications in stability problems in control theory and in the study of general matrices. Three designs based on VLSI parallel processor arrays are presented to compute a symmetrizer of a lower Hessenberg matrix. Their scope is discussed. The first one is the Leiserson systolic design while the remaining two, viz., the double pipe design and the fitted diagonal design are the derived versions of the first design with improved performance.
Resumo:
This paper deals with the development of a new model for the cooling process on the runout table of hot strip mills, The suitability of different numerical methods for the solution of the proposed model equation from the point of view of accuracy and computation time are studied, Parallel solutions for the model equation are proposed.
Resumo:
This paper discusses the parallel implementation of the solution of a set of linear equations using the Alternative Quadrant Interlocking Factorisation Methods (AQIF), on a star topology. Both the AQIF and LU decomposition methods are mapped onto star topology on an IBM SP2 system, with MPI as the internode communicator. Performance parameters such as speedup, efficiency have been obtained through experimental and theoretical means. The studies demonstrate (i) a mismatch of 15% between the theoretical and experimental results, (ii) scalability of the AQIF algorithm, and (iii) faster executing AQIF algorithm.
Resumo:
Parallel execution of computational mechanics codes requires efficient mesh-partitioning techniques. These mesh-partitioning techniques divide the mesh into specified number of submeshes of approximately the same size and at the same time, minimise the interface nodes of the submeshes. This paper describes a new mesh partitioning technique, employing Genetic Algorithms. The proposed algorithm operates on the deduced graph (dual or nodal graph) of the given finite element mesh rather than directly on the mesh itself. The algorithm works by first constructing a coarse graph approximation using an automatic graph coarsening method. The coarse graph is partitioned and the results are interpolated onto the original graph to initialise an optimisation of the graph partition problem. In practice, hierarchy of (usually more than two) graphs are used to obtain the final graph partition. The proposed partitioning algorithm is applied to graphs derived from unstructured finite element meshes describing practical engineering problems and also several example graphs related to finite element meshes given in the literature. The test results indicate that the proposed GA based graph partitioning algorithm generates high quality partitions and are superior to spectral and multilevel graph partitioning algorithms.
Resumo:
In this paper, we present a differential-geometric approach to analyze the singularities of task space point trajectories of two and three-degree-of-freedom serial and parallel manipulators. At non-singular configurations, the first-order, local properties are characterized by metric coefficients, and, geometrically, by the shape and size of a velocity ellipse or an ellipsoid. At singular configurations, the determinant of the matrix of metric coefficients is zero and the velocity ellipsoid degenerates to an ellipse, a line or a point, and the area or the volume of the velocity ellipse or ellipsoid becomes zero. The degeneracies of the velocity ellipsoid or ellipse gives a simple geometric picture of the possible task space velocities at a singular configuration. To study the second-order properties at a singularity, we use the derivatives of the metric coefficients and the rate of change of area or volume. The derivatives are shown to be related to the possible task space accelerations at a singular configuration. In the case of parallel manipulators, singularities may lead to either loss or gain of one or more degrees-of-freedom. For loss of one or more degrees-of-freedom, ther possible velocities and accelerations are again obtained from a modified metric and derivatives of the metric coefficients. In the case of a gain of one or more degrees-of-freedom, the possible task space velocities can be pictured as growth to lines, ellipses, and ellipsoids. The theoretical results are illustrated with the help of a general spatial 2R manipulator and a three-degree-of-freedom RPSSPR-SPR parallel manipulator.
Resumo:
In this paper a new parallel algorithm for nonlinear transient dynamic analysis of large structures has been presented. An unconditionally stable Newmark-beta method (constant average acceleration technique) has been employed for time integration. The proposed parallel algorithm has been devised within the broad framework of domain decomposition techniques. However, unlike most of the existing parallel algorithms (devised for structural dynamic applications) which are basically derived using nonoverlapped domains, the proposed algorithm uses overlapped domains. The parallel overlapped domain decomposition algorithm proposed in this paper has been formulated by splitting the mass, damping and stiffness matrices arises out of finite element discretisation of a given structure. A predictor-corrector scheme has been formulated for iteratively improving the solution in each step. A computer program based on the proposed algorithm has been developed and implemented with message passing interface as software development environment. PARAM-10000 MIMD parallel computer has been used to evaluate the performances. Numerical experiments have been conducted to validate as well as to evaluate the performance of the proposed parallel algorithm. Comparisons have been made with the conventional nonoverlapped domain decomposition algorithms. Numerical studies indicate that the proposed algorithm is superior in performance to the conventional domain decomposition algorithms. (C) 2003 Elsevier Ltd. All rights reserved.