4 resultados para Bol loops
em Greenwich Academic Literature Archive - UK
Resumo:
The manual effort required to convert sequential computational mechanics programs into a useful, scalable parallel form is considerable. Tools that can assist in the conversion process are clearly required. Computer aided parallelisation tools (CAPTools) have been developed to generate efficient parallel code for real world structured grid application codes such as Computational Fluid Dynamics. Automatable single-program multi-data (SPMD) overlapping domain decomposition (DD) techniques established for structured grid codes have been adapted by the authors to manually parallelise unstructured mesh applications. Inspector loops have been used to provide generic techniques for the run-time support necessary to extend the capabilities of CAPTools to automatic implementation of SPMD DD techniques in the parallelisation of unstructured mesh codes. Copyright © 1999 John Wiley & Sons, Ltd.
Resumo:
Three paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation -- a nonlinear, structured-grid partial differential equation boundary value problem -- using the same algorithm on the same hardware. All of the paradigms -- parallel languages represented by the Portland Group's HPF, (semi-)automated serial-to-parallel source-to-source translation represented by CAP-Tools from the University of Greenwich, and parallel libraries represented by Argonne's PETSc -- are found to be easy to use for this problem class, and all are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under any paradigm includes specification of the data partitioning, corresponding to a geometrically simple decomposition of the domain of the PDE. Programming in SPMD style for the PETSc library requires writing only the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global-to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm as a starting point, introduction of concurrency through subdomain blocking (a task similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Programming with CAPTools involves feeding the same sequential implementation to the CAPTools interactive parallelization system, and guiding the source-to-source code transformation by responding to various queries about quantities knowable only at runtime. Results representative of "the state of the practice" for a scaled sequence of structured grid problems are given on three of the most important contemporary high-performance platforms: the IBM SP, the SGI Origin 2000, and the CRAYY T3E.
Resumo:
We consider the load-balancing problems which arise from parallel scientific codes containing multiple computational phases, or loops over subsets of the data, which are separated by global synchronisation points. We motivate, derive and describe the implementation of an approach which we refer to as the multiphase mesh partitioning strategy to address such issues. The technique is tested on several examples of meshes, both real and artificial, containing multiple computational phases and it is demonstrated that our method can achieve high quality partitions where a standard mesh partitioning approach fails.
Resumo:
We consider the load-balancing problems which arise from parallel scientific codes containing multiple computational phases, or loops over subsets of the data, which are separated by global synchronisation points. We motivate, derive and describe the implementation of an approach which we refer to as the multiphase mesh partitioning strategy to address such issues. The technique is tested on example meshes containing multiple computational phases and it is demonstrated that our method can achieve high quality partitions where a standard mesh partitioning approach fails.