981 resultados para distributed-memory algorithms


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Fibre Distributed Data Interface (FDDI) represents the new generation of local area networks (LANs). These high speed LANs are capable of supporting up to 500 users over a 100 km distance. User traffic is expected to be as diverse as file transfers, packet voice and video. As the proliferation of FDDI LANs continues, the need to interconnect these LANs arises. FDDI LAN interconnection can be achieved in a variety of different ways. Some of the most commonly used today are public data networks, dial up lines and private circuits. For applications that can potentially generate large quantities of traffic, such as an FDDI LAN, it is cost effective to use a private circuit leased from the public carrier. In order to send traffic from one LAN to another across the leased line, a routing algorithm is required. Much research has been done on the Bellman-Ford algorithm and many implementations of it exist in computer networks. However, due to its instability and problems with routing table loops it is an unsatisfactory algorithm for interconnected FDDI LANs. A new algorithm, termed ISIS which is being standardized by the ISO provides a far better solution. ISIS will be implemented in many manufacturers routing devices. In order to make the work as practical as possible, this algorithm will be used as the basis for all the new algorithms presented. The ISIS algorithm can be improved by exploiting information that is dropped by that algorithm during the calculation process. A new algorithm, called Down Stream Path Splits (DSPS), uses this information and requires only minor modification to some of the ISIS routing procedures. DSPS provides a higher network performance, with very little additional processing and storage requirements. A second algorithm, also based on the ISIS algorithm, generates a massive increase in network performance. This is achieved by selecting alternative paths through the network in times of heavy congestion. This algorithm may select the alternative path at either the originating node, or any node along the path. It requires more processing and memory storage than DSPS, but generates a higher network power. The final algorithm combines the DSPS algorithm with the alternative path algorithm. This is the most flexible and powerful of the algorithms developed. However, it is somewhat complex and requires a fairly large storage area at each node. The performance of the new routing algorithms is tested in a comprehensive model of interconnected LANs. This model incorporates the transport through physical layers and generates random topologies for routing algorithm performance comparisons. Using this model it is possible to determine which algorithm provides the best performance without introducing significant complexity and storage requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A method is outlined for optimising graph partitions which arise in mapping unstructured mesh calculations to parallel computers. The method employs a relative gain iterative technique to both evenly balance the workload and minimise the number and volume of interprocessor communications. A parallel graph reduction technique is also briefly described and can be used to give a global perspective to the optimisation. The algorithms work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds. The algorithms can also be used for dynamic load-balancing, reusing existing partitions and in this case the procedures are much faster than static techniques, provide partitions of similar or higher quality and, in comparison, involve the migration of a fraction of the data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Electrical and Computer Engineering

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modelos de tomada de decisão necessitam refletir os aspectos da psi- cologia humana. Com este objetivo, este trabalho é baseado na Sparse Distributed Memory (SDM), um modelo psicologicamente e neuro- cientificamente plausível da memória humana, publicado por Pentti Kanerva, em 1988. O modelo de Kanerva possui um ponto crítico: um item de memória aquém deste ponto é rapidamente encontrado, e items além do ponto crítico não o são. Kanerva calculou este ponto para um caso especial com um seleto conjunto de parâmetros (fixos). Neste trabalho estendemos o conhecimento deste ponto crítico, através de simulações computacionais, e analisamos o comportamento desta “Critical Distance” sob diferentes cenários: em diferentes dimensões; em diferentes números de items armazenados na memória; e em diferentes números de armazenamento do item. Também é derivada uma função que, quando minimizada, determina o valor da “Critical Distance” de acordo com o estado da memória. Um objetivo secundário do trabalho é apresentar a SDM de forma simples e intuitiva para que pesquisadores de outras áreas possam imaginar como ela pode ajudá-los a entender e a resolver seus problemas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A parallel technique, for a distributed memory machine, based on domain decomposition for solving the Navier-Stokes equations in cartesian and cylindrical coordinates in two dimensions with free surfaces is described. It is based on the code by Tome and McKee (J. Comp. Phys. 110 (1994) 171-186) and Tome (Ph.D. Thesis, University of Strathclyde, Glasgow, 1993) which in turn is based on the SMAC method by Amsden and Harlow (Report LA-4370, Los Alamos Scientific Laboratory, 1971), which solves the Navier-Stokes equations in three steps: the momentum and Poisson equations and particle movement, These equations are discretized by explicit and 5-point finite differences. The parallelization is performed by splitting the computation domain into vertical panels and assigning each of these panels to a processor. All the computation can then be performed using nearest neighbour communication. Test runs comparing the performance of the parallel with the serial code, and a discussion of the load balancing question are presented. PVM is used for communication between processes. (C) 1999 Elsevier B.V. B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A large class of computational problems are characterised by frequent synchronisation, and computational requirements which change as a function of time. When such a problem is solved on a message passing multiprocessor machine [5], the combination of these characteristics leads to system performance which deteriorate in time. As the communication performance of parallel hardware steadily improves so load balance becomes a dominant factor in obtaining high parallel efficiency. Performance can be improved with periodic redistribution of computational load; however, redistribution can sometimes be very costly. We study the issue of deciding when to invoke a global load re-balancing mechanism. Such a decision policy must actively weigh the costs of remapping against the performance benefits, and should be general enough to apply automatically to a wide range of computations. This paper discusses a generic strategy for Dynamic Load Balancing (DLB) in unstructured mesh computational mechanics applications. The strategy is intended to handle varying levels of load changes throughout the run. The major issues involved in a generic dynamic load balancing scheme will be investigated together with techniques to automate the implementation of a dynamic load balancing mechanism within the Computer Aided Parallelisation Tools (CAPTools) environment, which is a semi-automatic tool for parallelisation of mesh based FORTRAN codes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the complexity of parallel applications increase, the performance limitations resulting from computational load imbalance become dominant. Mapping the problem space to the processors in a parallel machine in a manner that balances the workload of each processors will typically reduce the run-time. In many cases the computation time required for a given calculation cannot be predetermined even at run-time and so static partition of the problem returns poor performance. For problems in which the computational load across the discretisation is dynamic and inhomogeneous, for example multi-physics problems involving fluid and solid mechanics with phase changes, the workload for a static subdomain will change over the course of a computation and cannot be estimated beforehand. For such applications the mapping of loads to process is required to change dynamically, at run-time in order to maintain reasonable efficiency. The issue of dynamic load balancing are examined in the context of PHYSICA, a three dimensional unstructured mesh multi-physics continuum mechanics computational modelling code.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sao Paulo State Research Foundation-FAPESP

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Numerical methods related to Krylov subspaces are widely used in large sparse numerical linear algebra. Vectors in these subspaces are manipulated via their representation onto orthonormal bases. Nowadays, on serial computers, the method of Arnoldi is considered as a reliable technique for constructing such bases. However, although easily parallelizable, this technique is not as scalable as expected for communications. In this work we examine alternative methods aimed at overcoming this drawback. Since they retrieve upon completion the same information as Arnoldi's algorithm does, they enable us to design a wide family of stable and scalable Krylov approximation methods for various parallel environments. We present timing results obtained from their implementation on two distributed-memory multiprocessor supercomputers: the Intel Paragon and the IBM Scalable POWERparallel SP2. (C) 1997 by John Wiley & Sons, Ltd.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica