29 resultados para Code optimization
em Greenwich Academic Literature Archive - UK
Resumo:
This chapter discusses the code parallelization environment, where a number of tools that address the main tasks, such as code parallelization, debugging, and optimization are available. The parallelization tools include ParaWise and CAPO, which enable the near automatic parallelization of real world scientific application codes for shared and distributed memory-based parallel systems. The chapter discusses the use of ParaWise and CAPO to transform the original serial code into an equivalent parallel code that contains appropriate OpenMP directives. Additionally, as user involvement can introduce errors, a relative debugging tool (P2d2) is also available and can be used to perform near automatic relative debugging of an OpenMP program that has been parallelized either using the tools or manually. In order for these tools to be effective in parallelizing a range of applications, a high quality fully inter-procedural dependence analysis, as well as user interaction is vital to the generation of efficient parallel code and in the optimization of the backtracking and speculation process used in relative debugging. Results of parallelized NASA codes are discussed and show the benefits of using the environment.
Resumo:
Code parallelization using OpenMP for shared memory systems is relatively easier than using message passing for distributed memory systems. Despite this, it is still a challenge to use OpenMP to parallelize application codes in a way that yields effective scalable performance when executed on a shared memory parallel system. We describe an environment that will assist the programmer in the various tasks of code parallelization and this is achieved in a greatly reduced time frame and level of skill required. The parallelization environment includes a number of tools that address the main tasks of parallelism detection, OpenMP source code generation, debugging and optimization. These tools include a high quality, fully interprocedural dependence analysis with user interaction capabilities to facilitate the generation of efficient parallel code, an automatic relative debugging tool to identify erroneous user decisions in that interaction and also performance profiling to identify bottlenecks. Finally, experiences of parallelizing some NASA application codes are presented to illustrate some of the benefits of using the evolving environment.
Resumo:
The availability of a very accurate dependence graph for a scalar code is the basis for the automatic generation of an efficient parallel implementation. The strategy for this task which is encapsulated in a comprehensive data partitioning code generation algorithm is described. This algorithm involves the data partition, calculation of assignment ranges for partitioned arrays, addition of a comprehensive set of execution control masks, altering loop limits, addition and optimisation of communications for all data. In this context, the development and implementation of strategies to merge communications wherever possible has proved an important feature in producing efficient parallel implementations for numerical mesh based codes. The code generation strategies described here are embedded within the Computer Aided Parallelisation tools (CAPTools) software as a key part of a toolkit for automating as much as possible of the parallelisation process for mesh based numerical codes. The algorithms used enables parallelisation of real computational mechanics codes with only minor user interaction and without any prior manual customisation of the serial code to suit the parallelisation tool.
Resumo:
User supplied knowledge and interaction is a vital component of a toolkit for producing high quality parallel implementations of scalar FORTRAN numerical code. In this paper we consider the necessary components that such a parallelisation toolkit should possess to provide an effective environment to identify, extract and embed user relevant user knowledge. We also examine to what extent these facilities are available in leading parallelisation tools; in particular we discuss how these issues have been addressed in the development of the user interface of the Computer Aided Parallelisation Tools (CAPTools). The CAPTools environment has been designed to enable user exploration, interaction and insertion of user knowledge to facilitate the automatic generation of very efficient parallel code. A key issue in the user's interaction is control of the volume of information so that the user is focused on only that which is needed. User control over the level and extent of information revealed at any phase is supplied using a wide variety of filters. Another issue is the way in which information is communicated. Dependence analysis and its resulting graphs involve a lot of sophisticated rather abstract concepts unlikely to be familiar to most users of parallelising tools. As such, considerable effort has been made to communicate with the user in terms that they will understand. These features, amongst others, and their use in the parallelisation process are described and their effectiveness discussed.
Resumo:
This paper addresses the exploitation of overlapping communication with calculation within parallel FORTRAN 77 codes for computational fluid dynamics (CFD) and computational structured dynamics (CSD). The obvious objective is to overlap interprocessor communication with calculation on each processor in a distributed memory parallel system and so improve the efficiency of the parallel implementation. A general strategy for converting synchronous to overlapped communication is presented together with tools to enable its automatic implementation in FORTRAN 77 codes. This strategy is then implemented within the parallelisation toolkit, CAPTools, to facilitate the automatic generation of parallel code with overlapped communications. The success of these tools are demonstrated on two codes from the NAS-PAR and PERFECT benchmark suites. In each case, the tools produce parallel code with overlapped communications which is as good as that which could be generated manually. The parallel performance of the codes also improve in line with expectation.
Resumo:
Parallel computing is now widely used in numerical simulation, particularly for application codes based on finite difference and finite element methods. A popular and successful technique employed to parallelize such codes onto large distributed memory systems is to partition the mesh into sub-domains that are then allocated to processors. The code then executes in parallel, using the SPMD methodology, with message passing for inter-processor interactions. In order to improve the parallel efficiency of an imbalanced structured mesh CFD code, a new dynamic load balancing (DLB) strategy has been developed in which the processor partition range limits of just one of the partitioned dimensions uses non-coincidental limits, as opposed to coincidental limits. The ‘local’ partition limit change allows greater flexibility in obtaining a balanced load distribution, as the workload increase, or decrease, on a processor is no longer restricted by the ‘global’ (coincidental) limit change. The automatic implementation of this generic DLB strategy within an existing parallel code is presented in this chapter, along with some preliminary results.
Resumo:
The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier–Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright © 2000 John Wiley & Sons, Ltd.
Resumo:
SMARTFIRE, an open architecture integrated CFD code and knowledge based system attempts to make fire field modeling accessible to non-experts in Computational Fluid Dynamics (CFD) such as fire fighters, architects and fire safety engineers. This is achieved by embedding expert knowledge into CFD software. This enables the 'black-art' associated with the CFD analysis such as selection of solvers, relaxation parameters, convergence criteria, time steps, grid and boundary condition specification to be guided by expert advice from the software. The user is however given the option of overriding these decisions, thus retaining ultimate control. SMARTFIRE also makes use of recent developments in CFD technology such as unstructured meshes and group solvers in order to make the CFD analysis more efficient. This paper describes the incorporation within SMARTFIRE of the expert fire modeling knowledge required for automatic problem setup and mesh generation as well as the concept and use of group solvers for automatic and manual dynamic control of the CFD code.
Resumo:
Virtual manufacturing and design assessment increasingly involve the simulation of interacting phenomena, sic. multi-physics, an activity which is very computationally intensive. This chapter describes an attempt to address the parallel issues associated with a multi-physics simulation approach based upon a range of compatible procedures operating on one mesh using a single database - the distinct physics solvers can operate separately or coupled on sub-domains of the whole geometric space. Moreover, the finite volume unstructured mesh solvers use different discretization schemes (and, particularly, different ‘nodal’ locations and control volumes). A two-level approach to the parallelization of this simulation software is described: the code is restructured into parallel form on the basis of the mesh partitioning alone, that is, without regard to the physics. However, at run time, the mesh is partitioned to achieve a load balance, by considering the load per node/element across the whole domain. The latter of course is determined by the problem specific physics at a particular location.
Resumo:
This paper briefly describes an interactive parallelisation toolkit that can be used to generate parallel code suitable for either a distributed memory system (using message passing) or a shared memory system (using OpenMP). This study focuses on how the toolkit is used to parallelise a complex heterogeneous ocean modelling code within a few hours for use on a shared memory parallel system. The generated parallel code is essentially the serial code with OpenMP directives added to express the parallelism. The results show that substantial gains in performance can be achieved over the single thread version with very little effort.
Resumo:
Many code generation tools exist to aid developers in carrying out common mappings, such as from Object to XML or from Object to relational database. Such generated code tends to possess a high binding between the Object code and the target mapping, making integration into a broader application tedious or even impossible. In this paper we suggest XML technologies and the multiple inheritance capabilities of interface based languages such as Java, offer a means to unify such executable specifications, thus building complete, consistent and useful object models declaratively, without sacrificing component flexibility.
Resumo:
This paper describes an interactive parallelisation toolkit that can be used to generate parallel code suitable for either a distributed memory system (using message passing) or a shared memory system (using OpenMP). This study focuses on how the toolkit is used to parallelise a complex heterogeneous ocean modelling code within a few hours for use on a shared memory parallel system. The generated parallel code is essentially the serial code with OpenMP directives added to express the parallelism. The results show that substantial gains in performance can be achieved over the single thread version with very little effort.
Resumo:
This paper demonstrates a modeling and design approach that couples computational mechanics techniques with numerical optimisation and statistical models for virtual prototyping and testing in different application areas concerning reliability of eletronic packages. The integrated software modules provide a design engineer in the electronic manufacturing sector with fast design and process solutions by optimizing key parameters and taking into account complexity of certain operational conditions. The integrated modeling framework is obtained by coupling the multi-phsyics finite element framework - PHYSICA - with the numerical optimisation tool - VisualDOC into a fully automated design tool for solutions of electronic packaging problems. Response Surface Modeling Methodolgy and Design of Experiments statistical tools plus numerical optimisaiton techniques are demonstrated as a part of the modeling framework. Two different problems are discussed and solved using the integrated numerical FEM-Optimisation tool. First, an example of thermal management of an electronic package on a board is illustrated. Location of the device is optimized to ensure reduced junction temperature and stress in the die subject to certain cooling air profile and other heat dissipating active components. In the second example thermo-mechanical simulations of solder creep deformations are presented to predict flip-chip reliability and subsequently used to optimise the life-time of solder interconnects under thermal cycling.