Biblioteca Digital

43 resultados para parallel architectures

CAPLib - A 'thin layer' message passing library to support computational mechanics codes on distributed memory parallel systems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Computer Aided Parallelisation Tools (CAPTools) [Ierotheou, C, Johnson SP, Cross M, Leggett PF, Computer aided parallelisation tools (CAPTools)-conceptual overview and performance on the parallelisation of structured mesh codes, Parallel Computing, 1996;22:163±195] is a set of interactive tools aimed to provide automatic parallelisation of serial FORTRAN Computational Mechanics (CM) programs. CAPTools analyses the user's serial code and then through stages of array partitioning, mask and communication calculation, generates parallel SPMD (Single Program Multiple Data) messages passing FORTRAN. The parallel code generated by CAPTools contains calls to a collection of routines that form the CAPTools communications Library (CAPLib). The library provides a portable layer and user friendly abstraction over the underlying parallel environment. CAPLib contains optimised message passing routines for data exchange between parallel processes and other utility routines for parallel execution control, initialisation and debugging. By compiling and linking with different implementations of the library, the user is able to run on many different parallel environments. Even with today's parallel systems the concept of a single version of a parallel application code is more of an aspiration than a reality. However for CM codes the data partitioning SPMD paradigm requires a relatively small set of message-passing communication calls. This set can be implemented as an intermediate `thin layer' library of message-passing calls that enables the parallel code (especially that generated automatically by a parallelisation tool such as CAPTools) to be as generic as possible. CAPLib is just such a `thin layer' message passing library that supports parallel CM codes, by mapping generic calls onto machine specific libraries (such as CRAY SHMEM) and portable general purpose libraries (such as PVM an MPI). This paper describe CAPLib together with its three perceived advantages over other routes: - as a high level abstraction, it is both easy to understand (especially when generated automatically by tools) and to implement by hand, for the CM community (who are not generally parallel computing specialists); - the one parallel version of the application code is truly generic and portable; - the parallel application can readily utilise whatever message passing libraries on a given machine yield optimum performance.

Dynamic multi-partitioning for parallel finite element applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The central product of the DRAMA (Dynamic Re-Allocation of Meshes for parallel Finite Element Applications) project is a library comprising a variety of tools for dynamic re-partitioning of unstructured Finite Element (FE) applications. The input to the DRAMA library is the computational mesh, and corresponding costs, partitioned into sub-domains. The core library functions then perform a parallel computation of a mesh re-allocation that will re-balance the costs based on the DRAMA cost model. We discuss the basic features of this cost model, which allows a general approach to load identification, modelling and imbalance minimisation. Results from crash simulations are presented which show the necessity for multi-phase/multi-constraint partitioning components

Parallel performance in multi-physics simulation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A comprehensive simulation of solidification/melting processes requires the simultaneous representation of free surface fluid flow, heat transfer, phase change, non-linear solid mechanics and, possibly, electromagnetics together with their interactions in what is now referred to as "multi-physics" simulation. A 3D computational procedure and software tool, PHYSICA, embedding the above multi-physics models using finite volume methods on unstructured meshes (FV-UM) has been developed. Multi-physics simulations are extremely compute intensive and a strategy to parallelise such codes has, therefore, been developed. This strategy has been applied to PHYSICA and evaluated on a range of challenging multi-physics problems drawn from actual industrial cases.

Multiphase mesh partitioning for parallel computational mechanics codes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the load-balancing problems which arise from parallel scientific codes containing multiple computational phases, or loops over subsets of the data, which are separated by global synchronisation points. We motivate, derive and describe the implementation of an approach which we refer to as the multiphase mesh partitioning strategy to address such issues. The technique is tested on example meshes containing multiple computational phases and it is demonstrated that our method can achieve high quality partitions where a standard mesh partitioning approach fails.

Parallel mesh partitioning on distributed memory systems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of deriving parallel mesh partitioning algorithms for mapping unstructured meshes to parallel computers is discussed in this chapter. In itself this raises a paradox - we seek to find a high quality partition of the mesh, but to compute it in parallel we require a partition of the mesh. In fact, we overcome this difficulty by deriving an optimisation strategy which can find a high quality partition even if the quality of the initial partition is very poor and then use a crude distribution scheme for the initial partition. The basis of this strategy is to use a multilevel approach combined with local refinement algorithms. Three such refinement algorithms are outlined and some example results presented which show that they can produce very high global quality partitions, very rapidly. The results are also compared with a similar multilevel serial partitioner and shown to be almost identical in quality. Finally we consider the impact of the initial partition on the results and demonstrate that the final partition quality is, modulo a certain amount of noise, independent of the initial partition.

Dynamic mesh partitioning and load-balancing for parallel computational mechanics codes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this Chapter we discuss the load-balancing issues arising in parallel mesh based computational mechanics codes for which the processor loading changes during the run. We briefly touch on geometric repartitioning ideas and then focus on different ways of using a graph both to solve the load-balancing problem and the optimisation problem, both locally and globally. We also briefly discuss whether repartitioning is always valid. Sample illustrative results are presented and we conclude that repartitioning is an attractive option if the load changes are not too dramatic and that there is a certain trade-off between partition quality and volume of data that the underlying application needs to migrate.

Scheduling parallel dedicated machines under a single non-shared resource

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The scheduling problem of minimizing the makespan for m parallel dedicated machines under single resource constraints is considered. For different variants of the problem the complexity status is established. Heuristic algorithms employing the so-called group technology approach are presented and their worst-case behavior is examined. Finally, a polynomial time approximation scheme is presented for the problem with fixed number of machines.

Scheduling problems for parallel dedicated machines under multiple resource constraints

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper considers scheduling problems for parallel dedicated machines subject to resource constraints. A fairly complete computational complexity classification is obtained, a number of polynomial-time algorithms are designed. For the problem with a fixed number of machines in which a job uses at most one resource of unit size a polynomial-time approximation scheme is offered.

Parallel processing for non-linear problems

Relevância:

20.00% 20.00%

Publicador:

An environment for OpenMP code parallelization

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This chapter discusses the code parallelization environment, where a number of tools that address the main tasks, such as code parallelization, debugging, and optimization are available. The parallelization tools include ParaWise and CAPO, which enable the near automatic parallelization of real world scientific application codes for shared and distributed memory-based parallel systems. The chapter discusses the use of ParaWise and CAPO to transform the original serial code into an equivalent parallel code that contains appropriate OpenMP directives. Additionally, as user involvement can introduce errors, a relative debugging tool (P2d2) is also available and can be used to perform near automatic relative debugging of an OpenMP program that has been parallelized either using the tools or manually. In order for these tools to be effective in parallelizing a range of applications, a high quality fully inter-procedural dependence analysis, as well as user interaction is vital to the generation of efficient parallel code and in the optimization of the backtracking and speculation process used in relative debugging. Results of parallelized NASA codes are discussed and show the benefits of using the environment.

Single machine scheduling and due date assignment under series-parallel precedence constraints

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider a single machine due date assignment and scheduling problem of minimizing holding costs with no tardy jobs tinder series parallel and somewhat wider class of precedence constraints as well as the properties of series-parallel graphs.

Parallel implementation of the recurrence method for computing the power-spectral density of thin avalanche photodiodes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A simulation program has been developed to calculate the power-spectral density of thin avalanche photodiodes, which are used in optical networks. The program extends the time-domain analysis of the dead-space multiplication model to compute the autocorrelation function of the APD impulse response. However, the computation requires a large amount of memory space and is very time consuming. We describe our experiences in parallelizing the code using both MPI and OpenMP. Several array partitioning schemes and scheduling policies are implemented and tested Our results show that the OpenMP code is scalable up to 64 processors on an SGI Origin 2000 machine and has small average errors.

Parallel bandwidth characteristics calculations for thin avalanche photodiodes on a SGI Origin 2000 supercomputer

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An important factor for high-speed optical communication is the availability of ultrafast and low-noise photodetectors. Among the semiconductor photodetectors that are commonly used in today’s long-haul and metro-area fiber-optic systems, avalanche photodiodes (APDs) are often preferred over p-i-n photodiodes due to their internal gain, which significantly improves the receiver sensitivity and alleviates the need for optical pre-amplification. Unfortunately, the random nature of the very process of carrier impact ionization, which generates the gain, is inherently noisy and results in fluctuations not only in the gain but also in the time response. Recently, a theory characterizing the autocorrelation function of APDs has been developed by us which incorporates the dead-space effect, an effect that is very significant in thin, high-performance APDs. The research extends the time-domain analysis of the dead-space multiplication model to compute the autocorrelation function of the APD impulse response. However, the computation requires a large amount of memory space and is very time consuming. In this research, we describe our experiences in parallelizing the code in MPI and OpenMP using CAPTools. Several array partitioning schemes and scheduling policies are implemented and tested. Our results show that the code is scalable up to 64 processors on a SGI Origin 2000 machine and has small average errors.

Assessing the scalability of multiphysics tools for modeling solidification and melting processes on parallel clusters

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A comprehensive solution of solidification/melting processes requires the simultaneous representation of free surface fluid flow, heat transfer, phase change, nonlinear solid mechanics and, possibly, electromagnetics together with their interactions, in what is now known as multiphysics simulation. Such simulations are computationally intensive and the implementation of solution strategies for multiphysics calculations must embed their effective parallelization. For some years, together with our collaborators, we have been involved in the development of numerical software tools for multiphysics modeling on parallel cluster systems. This research has involved a combination of algorithmic procedures, parallel strategies and tools, plus the design of a computational modeling software environment and its deployment in a range of real world applications. One output from this research is the three-dimensional parallel multiphysics code, PHYSICA. In this paper we report on an assessment of its parallel scalability on a range of increasingly complex models drawn from actual industrial problems, on three contemporary parallel cluster systems.

The development of parallel implementation for a CFD based fire field model utilising conventional office based PCs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parallel processing techniques have been used in the past to provide high performance computing resources for activities such as fire-field modelling. This has traditionally been achieved using specialized hardware and software, the expense of which would be difficult to justify for many fire engineering practices. In this article we demonstrate how typical office-based PCs attached to a Local Area Network has the potential to offer the benefits of parallel processing with minimal costs associated with the purchase of additional hardware or software. It was found that good speedups could be achieved on homogeneous networks of PCs, for example a problem composed of ~100,000 cells would run 9.3 times faster on a network of 12 800MHz PCs than on a single 800MHz PC. It was also found that a network of eight 3.2GHz Pentium 4 PCs would run 7.04 times faster than a single 3.2GHz Pentium computer. A dynamic load balancing scheme was also devised to allow the effective use of the software on heterogeneous PC networks. This scheme also ensured that the impact between the parallel processing task and other computer users on the network was minimized.

«
1
2
3
»