190 resultados para Parallel computing

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Floquet analysis is widely used for small-order systems (say, order M < 100) to find trim results of control inputs and periodic responses, and stability results of damping levels and frequencies, Presently, however, it is practical neither for design applications nor for comprehensive analysis models that lead to large systems (M > 100); the run time on a sequential computer is simply prohibitive, Accordingly, a massively parallel Floquet analysis is developed with emphasis on large systems, and it is implemented on two SIMD or single-instruction, multiple-data computers with 4096 and 8192 processors, The focus of this development is a parallel shooting method with damped Newton iteration to generate trim results; the Floquet transition matrix (FTM) comes out as a byproduct, The eigenvalues and eigenvectors of the FTM are computed by a parallel QR method, and thereby stability results are generated, For illustration, flap and flap-lag stability of isolated rotors are treated by the parallel analysis and by a corresponding sequential analysis with the conventional shooting and QR methods; linear quasisteady airfoil aerodynamics and a finite-state three-dimensional wake model are used, Computational reliability is quantified by the condition numbers of the Jacobian matrices in Newton iteration, the condition numbers of the eigenvalues and the residual errors of the eigenpairs, and reliability figures are comparable in both the parallel and sequential analyses, Compared to the sequential analysis, the parallel analysis reduces the run time of large systems dramatically, and the reduction increases with increasing system order; this finding offers considerable promise for design and comprehensive-analysis applications.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A new parallel algorithm for transforming an arithmetic infix expression into a par se tree is presented. The technique is based on a result due to Fischer (1980) which enables the construction of the parse tree, by appropriately scanning the vector of precedence values associated with the elements of the expression. The algorithm presented here is suitable for execution on a shared memory model of an SIMD machine with no read/write conflicts permitted. It uses O(n) processors and has a time complexity of O(log2n) where n is the expression length. Parallel algorithms for generating code for an SIMD machine are also presented.

Relevância:

70.00% 70.00%

Publicador:

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Knowledge of protein-ligand interactions is essential to understand several biological processes and important for applications ranging from understanding protein function to drug discovery and protein engineering. Here, we describe an algorithm for the comparison of three-dimensional ligand-binding sites in protein structures. A previously described algorithm, PocketMatch (version 1.0) is optimised, expanded, and MPI-enabled for parallel execution. PocketMatch (version 2.0) rapidly quantifies binding-site similarity based on structural descriptors such as residue nature and interatomic distances. Atomic-scale alignments may also be obtained from amino acid residue pairings generated. It allows an end-user to compute database-wide, all-to-all comparisons in a matter of hours. The use of our algorithm on a sample dataset, performance-analysis, and annotated source code is also included.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A numerical scheme is presented for accurate simulation of fluid flow using the lattice Boltzmann equation (LBE) on unstructured mesh. A finite volume approach is adopted to discretize the LBE on a cell-centered, arbitrary shaped, triangular tessellation. The formulation includes a formal, second order discretization using a Total Variation Diminishing (TVD) scheme for the terms representing advection of the distribution function in physical space, due to microscopic particle motion. The advantage of the LBE approach is exploited by implementing the scheme in a new computer code to run on a parallel computing system. Performance of the new formulation is systematically investigated by simulating four benchmark flows of increasing complexity, namely (1) flow in a plane channel, (2) unsteady Couette flow, (3) flow caused by a moving lid over a 2D square cavity and (4) flow over a circular cylinder. For each of these flows, the present scheme is validated with the results from Navier-Stokes computations as well as lattice Boltzmann simulations on regular mesh. It is shown that the scheme is robust and accurate for the different test problems studied.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an inverse dynamic formulation by the Newton–Euler approach for the Stewart platform manipulator of the most general architecture and models all the dynamic and gravity effects as well as the viscous friction at the joints. It is shown that a proper elimination procedure results in a remarkably economical and fast algorithm for the solution of actuator forces, which makes the method quite suitable for on-line control purposes. In addition, the parallelism inherent in the manipulator and in the modelling makes the algorithm quite efficient in a parallel computing environment, where it can be made as fast as the corresponding formulation for the 6-dof serial manipulator. The formulation has been implemented in a program and has been used for a few trajectories planned for a test manipulator. Results of simulation presented in the paper reveal the nature of the variation of actuator forces in the Stewart platform and justify the dynamic modelling for control.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the advent of VLSI it has become possible to map parallel algorithms for compute-bound problems directly on silicon. Systolic architecture is very good candidate for VLSI implementation because of its regular and simple design, and regular communication pattern. In this paper, a systolic algorithm and corresponding systolic architecture, a linear systolic array, for the scanline-based hidden surface removal problem in three-dimensional computer graphics have been proposed. The algorithm is based on the concept of sample spans or intervals. The worst case time taken by the algorithm is O(n), n being the number of segments in a scanline. The time taken by the algorithm for a given scene depends on the scene itself, and on an average considerable improvement over the worst case behaviour is expected. A pipeline scheme for handling the I/O process has also been proposed which is suitable for VLSI implementation of the algorithm.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file. However, inter-cluster communication in clustered architectures leads to increased leakage in functional components and a high number of register accesses. In this paper, we propose compiler scheduling algorithms targeting two previously ignored power-hungry components in clustered VLIW architectures, viz., instruction decoder and register file. We consider a split decoder design and propose a new energy-aware instruction scheduling algorithm that provides 14.5% and 17.3% benefit in the decoder power consumption on an average over a purely hardware based scheme in the context of 2-clustered and 4-clustered VLIW machines. In the case of register files, we propose two new scheduling algorithms that exploit limited register snooping capability to reduce extra register file accesses. The proposed algorithms reduce register file power consumption on an average by 6.85% and 11.90% (10.39% and 17.78%), respectively, along with performance improvement of 4.81% and 5.34% (9.39% and 11.16%) over a traditional greedy algorithm for 2-clustered (4-clustered) VLIW machine. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A state-of-the-art model of the coupled ocean-atmosphere system, the climate forecast system (CFS), from the National Centres for Environmental Prediction (NCEP), USA, has been ported onto the PARAM Padma parallel computing system at the Centre for Development of Advanced Computing (CDAC), Bangalore and retrospective predictions for the summer monsoon (June-September) season of 2009 have been generated, using five initial conditions for the atmosphere and one initial condition for the ocean for May 2009. Whereas a large deficit in the Indian summer monsoon rainfall (ISMR; June-September) was experienced over the Indian region (with the all-India rainfall deficit by 22% of the average), the ensemble average prediction was for above-average rainfall during the summer monsoon. The retrospective predictions of ISMR with CFS from NCEP for 1981-2008 have been analysed. The retrospective predictions from NCEP for the summer monsoon of 1994 and that from CDAC for 2009 have been compared with the simulations for each of the seasons with the stand-alone atmospheric component of the model, the global forecast system (GFS), and observations. It has been shown that the simulation with GFS for 2009 showed deficit rainfall as observed. The large error in the prediction for the monsoon of 2009 can be attributed to a positive Indian Ocean Dipole event seen in the prediction from July onwards, which was not present in the observations. This suggests that the error could be reduced with improvement of the ocean model over the equatorial Indian Ocean.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an introduction to neurocomputers and an overview of the history of neurocomputers. Direct implementation methods of neurocomputers using techniques from microelectronics and photonics are discussed. Emulation methods using special-purpose hardware are highlighted. The role of parallel computing systems for improved performance is introduced. Some commercially available neurocomputers and performance issues of such systems are also presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Over past few years, the studies of cultured neuronal networks have opened up avenues for understanding the ion channels, receptor molecules, and synaptic plasticity that may form the basis of learning and memory. The hippocampal neurons from rats are dissociated and cultured on a surface containing a grid of 64 electrodes. The signals from these 64 electrodes are acquired using a fast data acquisition system MED64 (Alpha MED Sciences, Japan) at a sampling rate of 20 K samples with a precision of 16-bits per sample. A few minutes of acquired data runs in to a few hundreds of Mega Bytes. The data processing for the neural analysis is highly compute-intensive because the volume of data is huge. The major processing requirements are noise removal, pattern recovery, pattern matching, clustering and so on. In order to interface a neuronal colony to a physical world, these computations need to be performed in real-time. A single processor such as a desk top computer may not be adequate to meet this computational requirements. Parallel computing is a method used to satisfy the real-time computational requirements of a neuronal system that interacts with an external world while increasing the flexibility and scalability of the application. In this work, we developed a parallel neuronal system using a multi-node Digital Signal processing system. With 8 processors, the system is able to compute and map incoming signals segmented over a period of 200 ms in to an action in a trained cluster system in real time.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Precision, sophistication and economic factors in many areas of scientific research that demand very high magnitude of compute power is the order of the day. Thus advance research in the area of high performance computing is getting inevitable. The basic principle of sharing and collaborative work by geographically separated computers is known by several names such as metacomputing, scalable computing, cluster computing, internet computing and this has today metamorphosed into a new term known as grid computing. This paper gives an overview of grid computing and compares various grid architectures. We show the role that patterns can play in architecting complex systems, and provide a very pragmatic reference to a set of well-engineered patterns that the practicing developer can apply to crafting his or her own specific applications. We are not aware of pattern-oriented approach being applied to develop and deploy a grid. There are many grid frameworks that are built or are in the process of being functional. All these grids differ in some functionality or the other, though the basic principle over which the grids are built is the same. Despite this there are no standard requirements listed for building a grid. The grid being a very complex system, it is mandatory to have a standard Software Architecture Specification (SAS). We attempt to develop the same for use by any grid user or developer. Specifically, we analyze the grid using an object oriented approach and presenting the architecture using UML. This paper will propose the usage of patterns at all levels (analysis. design and architectural) of the grid development.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Unending quest for performance improvement coupled with the advancements in integrated circuit technology have led to the development of new architectural paradigm. Speculative multithreaded architecture (SpMT) philosophy relies on aggressive speculative execution for improved performance. However, aggressive speculative execution comes with a mixed flavor of improving performance, when successful, and adversely affecting the energy consumption (and performance) because of useless computation in the event of mis-speculation. Dynamic instruction criticality information can be usefully applied to control and guide such an aggressive speculative execution. In this paper, we present a model of micro-execution for SpMT architecture that we have developed to determine the dynamic instruction criticality. We have also developed two novel techniques utilizing the criticality information namely delaying the non-critical loads and the criticality based thread-prediction for reducing useless computations and energy consumption. Experimental results showing break-up of critical instructions and effectiveness of proposed techniques in reducing energy consumption are presented in the context of multiscalar processor that implements SpMT architecture. Our experiments show 17.7% and 11.6% reduction in dynamic energy for criticality based thread prediction and criticality based delayed load scheme respectively while the improvement in dynamic energy delay product is 13.9% and 5.5%, respectively. (c) 2012 Published by Elsevier B.V.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Decoherence as an obstacle in quantum computation is viewed as a struggle between two forces [1]: the computation which uses the exponential dimension of Hilbert space, and decoherence which destroys this entanglement by collapse. In this model of decohered quantum computation, a sequential quantum computer loses the battle, because at each time step, only a local operation is carried out but g*(t) number of gates collapse. With quantum circuits computing in parallel way the situation is different- g(t) number of gates can be applied at each time step and number gates collapse because of decoherence. As g(t) ≈ g*(t) competition here is even [1]. Our paper improves on this model by slowing down g*(t) by encoding the circuit in parallel computing architectures and running it in Single Instruction Multiple Data (SIMD) paradigm. We have proposed a parallel ion trap architecture for single-bit rotation of a qubit.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The contour tree is a topological abstraction of a scalar field that captures evolution in level set connectivity. It is an effective representation for visual exploration and analysis of scientific data. We describe a work-efficient, output sensitive, and scalable parallel algorithm for computing the contour tree of a scalar field defined on a domain that is represented using either an unstructured mesh or a structured grid. A hybrid implementation of the algorithm using the GPU and multi-core CPU can compute the contour tree of an input containing 16 million vertices in less than ten seconds with a speedup factor of upto 13. Experiments based on an implementation in a multi-core CPU environment show near-linear speedup for large data sets.