67 resultados para Parallel and distributed systems
Resumo:
We describe a compiler for the Flat Concurrent Prolog language on a message passing multiprocessor architecture. This compiler permits symbolic and declarative programming in the syntax of Guarded Horn Rules, The implementation has been verified and tested on the 64-node PARAM parallel computer developed by C-DAC (Centre for the Development of Advanced Computing, India), Flat Concurrent Prolog (FCP) is a logic programming language designed for concurrent programming and parallel execution, It is a process oriented language, which embodies dataflow synchronization and guarded-command as its basic control mechanisms. An identical algorithm is executed on every processor in the network, We assume regular network topologies like mesh, ring, etc, Each node has a local memory, The algorithm comprises of two important parts: reduction and communication, The most difficult task is to integrate the solutions of problems that arise in the implementation in a coherent and efficient manner. We have tested the efficacy of the compiler on various benchmark problems of the ICOT project that have been reported in the recent book by Evan Tick, These problems include Quicksort, 8-queens, and Prime Number Generation, The results of the preliminary tests are favourable, We are currently examining issues like indexing and load balancing to further optimize our compiler.
Resumo:
Multi-access techniques are widely used in computer networking and distributed multiprocessor systems. On-the-fly arbitration schemes permit one of the many contenders to access the medium without collisions. Serial arbitration is cost effective but is slow and hence unsuitable for high-speed multiprocessor environments supporting very high data transfer rates. A fully parallel arbitration scheme takes less time but is not practically realisable for large numbers of contenders. In this paper, a generalised parallel-serial scheme is proposed which significantly reduces the arbitration time and is practically realisable.
Resumo:
There are a number of large networks which occur in many problems dealing with the flow of power, communication signals, water, gas, transportable goods, etc. Both design and planning of these networks involve optimization problems. The first part of this paper introduces the common characteristics of a nonlinear network (the network may be linear, the objective function may be non linear, or both may be nonlinear). The second part develops a mathematical model trying to put together some important constraints based on the abstraction for a general network. The third part deals with solution procedures; it converts the network to a matrix based system of equations, gives the characteristics of the matrix and suggests two solution procedures, one of them being a new one. The fourth part handles spatially distributed networks and evolves a number of decomposition techniques so that we can solve the problem with the help of a distributed computer system. Algorithms for parallel processors and spatially distributed systems have been described.There are a number of common features that pertain to networks. A network consists of a set of nodes and arcs. In addition at every node, there is a possibility of an input (like power, water, message, goods etc) or an output or none. Normally, the network equations describe the flows amoungst nodes through the arcs. These network equations couple variables associated with nodes. Invariably, variables pertaining to arcs are constants; the result required will be flows through the arcs. To solve the normal base problem, we are given input flows at nodes, output flows at nodes and certain physical constraints on other variables at nodes and we should find out the flows through the network (variables at nodes will be referred to as across variables).The optimization problem involves in selecting inputs at nodes so as to optimise an objective function; the objective may be a cost function based on the inputs to be minimised or a loss function or an efficiency function. The above mathematical model can be solved using Lagrange Multiplier technique since the equalities are strong compared to inequalities. The Lagrange multiplier technique divides the solution procedure into two stages per iteration. Stage one calculates the problem variables % and stage two the multipliers lambda. It is shown that the Jacobian matrix used in stage one (for solving a nonlinear system of necessary conditions) occurs in the stage two also.A second solution procedure has also been imbedded into the first one. This is called total residue approach. It changes the equality constraints so that we can get faster convergence of the iterations.Both solution procedures are found to coverge in 3 to 7 iterations for a sample network.The availability of distributed computer systems — both LAN and WAN — suggest the need for algorithms to solve the optimization problems. Two types of algorithms have been proposed — one based on the physics of the network and the other on the property of the Jacobian matrix. Three algorithms have been deviced, one of them for the local area case. These algorithms are called as regional distributed algorithm, hierarchical regional distributed algorithm (both using the physics properties of the network), and locally distributed algorithm (a multiprocessor based approach with a local area network configuration). The approach used was to define an algorithm that is faster and uses minimum communications. These algorithms are found to converge at the same rate as the non distributed (unitary) case.
Resumo:
Many real-time database applications arise in electronic financial services, safety-critical installations and military systems where enforcing security is crucial to the success of the enterprise. For real-time database systems supporting applications with firm deadlines, we investigate here the performance implications, in terms of killed transactions, of guaranteeing multilevel secrecy. In particular, we focus on the concurrency control (CC) aspects of this issue. Our main contributions are the following: First, we identify which among the previously proposed real-time CC protocols are capable of providing covert-channel-free security. Second, using a detailed simulation model, we profile the real-time performance of a representative set of these secure CC protocols for a variety of security-classified workloads and system configurations. Our experiments show that a prioritized optimistic CC protocol, OPT-WAIT, provides the best overall performance. Third, we propose and evaluate a novel "dual-CC" approach that allows the real-time database system to simultaneously use different CC mechanisms for guaranteeing security and for improving real-time performance. By appropriately choosing these different mechanisms, concurrency control protocols that provide even better performance than OPT-WAIT are designed. Finally, we propose and evaluate GUARD, an adaptive admission-control policy designed to provide fairness with respect to the distribution of killed transactions across security levels. Our experiments show that GUARD efficiently provides close to ideal fairness for real-time applications that can tolerate covert channel bandwidths of upto one bit per second.
Resumo:
Floquet analysis is widely used for small-order systems (say, order M < 100) to find trim results of control inputs and periodic responses, and stability results of damping levels and frequencies, Presently, however, it is practical neither for design applications nor for comprehensive analysis models that lead to large systems (M > 100); the run time on a sequential computer is simply prohibitive, Accordingly, a massively parallel Floquet analysis is developed with emphasis on large systems, and it is implemented on two SIMD or single-instruction, multiple-data computers with 4096 and 8192 processors, The focus of this development is a parallel shooting method with damped Newton iteration to generate trim results; the Floquet transition matrix (FTM) comes out as a byproduct, The eigenvalues and eigenvectors of the FTM are computed by a parallel QR method, and thereby stability results are generated, For illustration, flap and flap-lag stability of isolated rotors are treated by the parallel analysis and by a corresponding sequential analysis with the conventional shooting and QR methods; linear quasisteady airfoil aerodynamics and a finite-state three-dimensional wake model are used, Computational reliability is quantified by the condition numbers of the Jacobian matrices in Newton iteration, the condition numbers of the eigenvalues and the residual errors of the eigenpairs, and reliability figures are comparable in both the parallel and sequential analyses, Compared to the sequential analysis, the parallel analysis reduces the run time of large systems dramatically, and the reduction increases with increasing system order; this finding offers considerable promise for design and comprehensive-analysis applications.
Resumo:
Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes.