8 resultados para Parallelism

em Greenwich Academic Literature Archive - UK


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The classical Purcell's vector method, for the construction of solutions to dense systems of linear equations is extended to a flexible orthogonalisation procedure. Some properties are revealed of the orthogonalisation procedure in relation to the classical Gauss-Jordan elimination with or without pivoting. Additional properties that are not shared by the classical Gauss-Jordan elimination are exploited. Further properties related to distributed computing are discussed with applications to panel element equations in subsonic compressible aerodynamics. Using an orthogonalisation procedure within panel methods enables a functional decomposition of the sequential panel methods and leads to a two-level parallelism.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Temperature distributions involved in some metal-cutting or surface-milling processes may be obtained by solving a non-linear inverse problem. A two-level concept on parallelism is introduced to compute such temperature distribution. The primary level is based on a problem-partitioning concept driven by the nature and properties of the non-linear inverse problem. Such partitioning results to a coarse-grained parallel algorithm. A simplified 2-D metal-cutting process is used as an example to illustrate the concept. A secondary level exploitation of further parallel properties based on the concept of domain-data parallelism is explained and implemented using MPI. Some experiments were performed on a network of loosely coupled machines consist of SUN Sparc Classic workstations and a network of tightly coupled processors, namely the Origin 2000.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper briefly describes an interactive parallelisation toolkit that can be used to generate parallel code suitable for either a distributed memory system (using message passing) or a shared memory system (using OpenMP). This study focuses on how the toolkit is used to parallelise a complex heterogeneous ocean modelling code within a few hours for use on a shared memory parallel system. The generated parallel code is essentially the serial code with OpenMP directives added to express the parallelism. The results show that substantial gains in performance can be achieved over the single thread version with very little effort.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Numerical solutions of realistic 2-D and 3-D inverse problems may require a very large amount of computation. A two-level concept on parallelism is often used to solve such problems. The primary level uses the problem partitioning concept which is a decomposition based on the mathematical/physical problem. The secondary level utilizes the widely used data partitioning concept. A theoretical performance model is built based on the two-level parallelism. The observed performance results obtained from a network of general purpose Sun Sparc stations are compared with the theoretical values. Restrictions of the theoretical model are also discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes an interactive parallelisation toolkit that can be used to generate parallel code suitable for either a distributed memory system (using message passing) or a shared memory system (using OpenMP). This study focuses on how the toolkit is used to parallelise a complex heterogeneous ocean modelling code within a few hours for use on a shared memory parallel system. The generated parallel code is essentially the serial code with OpenMP directives added to express the parallelism. The results show that substantial gains in performance can be achieved over the single thread version with very little effort.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The shared-memory programming model can be an effective way to achieve parallelism on shared memory parallel computers. Historically however, the lack of a programming standard using directives and the limited scalability have affected its take-up. Recent advances in hardware and software technologies have resulted in improvements to both the performance of parallel programs with compiler directives and the issue of portability with the introduction of OpenMP. In this study, the Computer Aided Parallelisation Toolkit has been extended to automatically generate OpenMP-based parallel programs with nominal user assistance. We categorize the different loop types and show how efficient directives can be placed using the toolkit's in-depth interprocedural analysis. Examples are taken from the NAS parallel benchmarks and a number of real-world application codes. This demonstrates the great potential of using the toolkit to quickly parallelise serial programs as well as the good performance achievable on up to 300 processors for hybrid message passing-directive parallelisations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Code parallelization using OpenMP for shared memory systems is relatively easier than using message passing for distributed memory systems. Despite this, it is still a challenge to use OpenMP to parallelize application codes in a way that yields effective scalable performance when executed on a shared memory parallel system. We describe an environment that will assist the programmer in the various tasks of code parallelization and this is achieved in a greatly reduced time frame and level of skill required. The parallelization environment includes a number of tools that address the main tasks of parallelism detection, OpenMP source code generation, debugging and optimization. These tools include a high quality, fully interprocedural dependence analysis with user interaction capabilities to facilitate the generation of efficient parallel code, an automatic relative debugging tool to identify erroneous user decisions in that interaction and also performance profiling to identify bottlenecks. Finally, experiences of parallelizing some NASA application codes are presented to illustrate some of the benefits of using the evolving environment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This chapter describes a parallel optimization technique that incorporates a distributed load-balancing algorithm and provides an extremely fast solution to the problem of load-balancing adaptive unstructured meshes. Moreover, a parallel graph contraction technique can be employed to enhance the partition quality and the resulting strategy outperforms or matches results from existing state-of-the-art static mesh partitioning algorithms. The strategy can also be applied to static partitioning problems. Dynamic procedures have been found to be much faster than static techniques, to provide partitions of similar or higher quality and, in comparison, involve the migration of a fraction of the data. The method employs a new iterative optimization technique that balances the workload and attempts to minimize the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more quickly. The dynamic evolution of load has three major influences on possible partitioning techniques; cost, reuse, and parallelism. The unstructured mesh may be modified every few time-steps and so the load-balancing must have a low cost relative to that of the solution algorithm in between remeshing.