980 resultados para Grid code
Resumo:
This paper addresses the exploitation of overlapping communication with calculation within parallel FORTRAN 77 codes for computational fluid dynamics (CFD) and computational structured dynamics (CSD). The obvious objective is to overlap interprocessor communication with calculation on each processor in a distributed memory parallel system and so improve the efficiency of the parallel implementation. A general strategy for converting synchronous to overlapped communication is presented together with tools to enable its automatic implementation in FORTRAN 77 codes. This strategy is then implemented within the parallelisation toolkit, CAPTools, to facilitate the automatic generation of parallel code with overlapped communications. The success of these tools are demonstrated on two codes from the NAS-PAR and PERFECT benchmark suites. In each case, the tools produce parallel code with overlapped communications which is as good as that which could be generated manually. The parallel performance of the codes also improve in line with expectation.
Resumo:
The manual effort required to convert sequential computational mechanics programs into a useful, scalable parallel form is considerable. Tools that can assist in the conversion process are clearly required. Computer aided parallelisation tools (CAPTools) have been developed to generate efficient parallel code for real world structured grid application codes such as Computational Fluid Dynamics. Automatable single-program multi-data (SPMD) overlapping domain decomposition (DD) techniques established for structured grid codes have been adapted by the authors to manually parallelise unstructured mesh applications. Inspector loops have been used to provide generic techniques for the run-time support necessary to extend the capabilities of CAPTools to automatic implementation of SPMD DD techniques in the parallelisation of unstructured mesh codes. Copyright © 1999 John Wiley & Sons, Ltd.
Resumo:
Three paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation -- a nonlinear, structured-grid partial differential equation boundary value problem -- using the same algorithm on the same hardware. All of the paradigms -- parallel languages represented by the Portland Group's HPF, (semi-)automated serial-to-parallel source-to-source translation represented by CAP-Tools from the University of Greenwich, and parallel libraries represented by Argonne's PETSc -- are found to be easy to use for this problem class, and all are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under any paradigm includes specification of the data partitioning, corresponding to a geometrically simple decomposition of the domain of the PDE. Programming in SPMD style for the PETSc library requires writing only the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global-to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm as a starting point, introduction of concurrency through subdomain blocking (a task similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Programming with CAPTools involves feeding the same sequential implementation to the CAPTools interactive parallelization system, and guiding the source-to-source code transformation by responding to various queries about quantities knowable only at runtime. Results representative of "the state of the practice" for a scaled sequence of structured grid problems are given on three of the most important contemporary high-performance platforms: the IBM SP, the SGI Origin 2000, and the CRAYY T3E.
Resumo:
The last few years have seen a substantial increase in the geometric complexity for 3D flow simulation. In this paper we describe the challenges in generating computation grids for 3D aerospace configuations and demonstrate the progress made to eventually achieve a push button technology for CAD to visualized flow. Special emphasis is given to the interfacing from the grid generator to the flow solver by semi-automatic generation of boundary conditions during the grid generation process. In this regard, once a grid has been generated, push button technology of most commercial flow solvers has been achieved. This will be demonstrated by the ad hoc simulation for the Hopper configuration.
Resumo:
This paper briefly describes an interactive parallelisation toolkit that can be used to generate parallel code suitable for either a distributed memory system (using message passing) or a shared memory system (using OpenMP). This study focuses on how the toolkit is used to parallelise a complex heterogeneous ocean modelling code within a few hours for use on a shared memory parallel system. The generated parallel code is essentially the serial code with OpenMP directives added to express the parallelism. The results show that substantial gains in performance can be achieved over the single thread version with very little effort.
Resumo:
In this paper a continuum model for the prediction of segregation in granular material is presented. The numerical framework, a 3-D, unstructured grid, finite-volume code is described, and the micro-physical parametrizations, which are used to describe the processes and interactions at the microscopic level that lead to segregation, are analysed. Numerical simulations and comparisons with experimental data are then presented and conclusions are drawn on the capability of the model to accurately simulate the behaviour of granular matter during flow.
Resumo:
Sound waves are propagating pressure fluctuations, which are typically several orders of magnitude smaller than the pressure variations in the flow field that account for flow acceleration. On the other hand, these fluctuations travel at the speed of sound in the medium, not as a transported fluid quantity. Due to the above two properties, the Reynolds averaged Navier–Stokes equations do not resolve the acoustic fluctuations. This paper discusses a defect correction method for this type of multi-scale problems in aeroacoustics. Numerical examples in one dimensional and two dimensional are used to illustrate the concept. Copyright (C) 2002 John Wiley & Sons, Ltd.
Resumo:
Many code generation tools exist to aid developers in carrying out common mappings, such as from Object to XML or from Object to relational database. Such generated code tends to possess a high binding between the Object code and the target mapping, making integration into a broader application tedious or even impossible. In this paper we suggest XML technologies and the multiple inheritance capabilities of interface based languages such as Java, offer a means to unify such executable specifications, thus building complete, consistent and useful object models declaratively, without sacrificing component flexibility.
Resumo:
This paper describes an interactive parallelisation toolkit that can be used to generate parallel code suitable for either a distributed memory system (using message passing) or a shared memory system (using OpenMP). This study focuses on how the toolkit is used to parallelise a complex heterogeneous ocean modelling code within a few hours for use on a shared memory parallel system. The generated parallel code is essentially the serial code with OpenMP directives added to express the parallelism. The results show that substantial gains in performance can be achieved over the single thread version with very little effort.
Resumo:
This chapter discusses the code parallelization environment, where a number of tools that address the main tasks, such as code parallelization, debugging, and optimization are available. The parallelization tools include ParaWise and CAPO, which enable the near automatic parallelization of real world scientific application codes for shared and distributed memory-based parallel systems. The chapter discusses the use of ParaWise and CAPO to transform the original serial code into an equivalent parallel code that contains appropriate OpenMP directives. Additionally, as user involvement can introduce errors, a relative debugging tool (P2d2) is also available and can be used to perform near automatic relative debugging of an OpenMP program that has been parallelized either using the tools or manually. In order for these tools to be effective in parallelizing a range of applications, a high quality fully inter-procedural dependence analysis, as well as user interaction is vital to the generation of efficient parallel code and in the optimization of the backtracking and speculation process used in relative debugging. Results of parallelized NASA codes are discussed and show the benefits of using the environment.
Resumo:
Despite the apparent simplicity of the OpenMP directive shared memory programming model and the sophisticated dependence analysis and code generation capabilities of the ParaWise/CAPO tools, experience shows that a level of expertise is required to produce efficient parallel code. In a real world application the investigation of a single loop in a generated parallel code can soon become an in-depth inspection of numerous dependencies in many routines. The additional understanding of dependencies is also needed to effectively interpret the information provided and supply the required feedback. The ParaWise Expert Assistant has been developed to automate this investigation and present questions to the user about, and in the context of, their application code. In this paper, we demonstrate that knowledge of dependence information and OpenMP are no longer essential to produce efficient parallel code with the Expert Assistant. It is hoped that this will enable a far wider audience to use the tools and subsequently, exploit the benefits of large parallel systems.
Resumo:
Code parallelization using OpenMP for shared memory systems is relatively easier than using message passing for distributed memory systems. Despite this, it is still a challenge to use OpenMP to parallelize application codes in a way that yields effective scalable performance when executed on a shared memory parallel system. We describe an environment that will assist the programmer in the various tasks of code parallelization and this is achieved in a greatly reduced time frame and level of skill required. The parallelization environment includes a number of tools that address the main tasks of parallelism detection, OpenMP source code generation, debugging and optimization. These tools include a high quality, fully interprocedural dependence analysis with user interaction capabilities to facilitate the generation of efficient parallel code, an automatic relative debugging tool to identify erroneous user decisions in that interaction and also performance profiling to identify bottlenecks. Finally, experiences of parallelizing some NASA application codes are presented to illustrate some of the benefits of using the evolving environment.
Resumo:
This paper presents an investigation into dynamic self-adjustment of task deployment and other aspects of self-management, through the embedding of multiple policies. Non-dedicated loosely-coupled computing environments, such as clusters and grids are increasingly popular platforms for parallel processing. These abundant systems are highly dynamic environments in which many sources of variability affect the run-time efficiency of tasks. The dynamism is exacerbated by the incorporation of mobile devices and wireless communication. This paper proposes an adaptive strategy for the flexible run-time deployment of tasks; to continuously maintain efficiency despite the environmental variability. The strategy centres on policy-based scheduling which is informed by contextual and environmental inputs such as variance in the round-trip communication time between a client and its workers and the effective processing performance of each worker. A self-management framework has been implemented for evaluation purposes. The framework integrates several policy-controlled, adaptive services with the application code, enabling the run-time behaviour to be adapted to contextual and environmental conditions. Using this framework, an exemplar self-managing parallel application is implemented and used to investigate the extent of the benefits of the strategy