16 resultados para Distributed embedded systems

em Greenwich Academic Literature Archive - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The availability of a very accurate dependence graph for a scalar code is the basis for the automatic generation of an efficient parallel implementation. The strategy for this task which is encapsulated in a comprehensive data partitioning code generation algorithm is described. This algorithm involves the data partition, calculation of assignment ranges for partitioned arrays, addition of a comprehensive set of execution control masks, altering loop limits, addition and optimisation of communications for all data. In this context, the development and implementation of strategies to merge communications wherever possible has proved an important feature in producing efficient parallel implementations for numerical mesh based codes. The code generation strategies described here are embedded within the Computer Aided Parallelisation tools (CAPTools) software as a key part of a toolkit for automating as much as possible of the parallelisation process for mesh based numerical codes. The algorithms used enables parallelisation of real computational mechanics codes with only minor user interaction and without any prior manual customisation of the serial code to suit the parallelisation tool.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Realizing scalable performance on high performance computing systems is not straightforward for single-phenomenon codes (such as computational fluid dynamics [CFD]). This task is magnified considerably when the target software involves the interactions of a range of phenomena that have distinctive solution procedures involving different discretization methods. The problems of addressing the key issues of retaining data integrity and the ordering of the calculation procedures are significant. A strategy for parallelizing this multiphysics family of codes is described for software exploiting finite-volume discretization methods on unstructured meshes using iterative solution procedures. A mesh partitioning-based SPMD approach is used. However, since different variables use distinct discretization schemes, this means that distinct partitions are required; techniques for addressing this issue are described using the mesh-partitioning tool, JOSTLE. In this contribution, the strategy is tested for a variety of test cases under a wide range of conditions (e.g., problem size, number of processors, asynchronous / synchronous communications, etc.) using a variety of strategies for mapping the mesh partition onto the processor topology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of deriving parallel mesh partitioning algorithms for mapping unstructured meshes to parallel computers is discussed in this chapter. In itself this raises a paradox - we seek to find a high quality partition of the mesh, but to compute it in parallel we require a partition of the mesh. In fact, we overcome this difficulty by deriving an optimisation strategy which can find a high quality partition even if the quality of the initial partition is very poor and then use a crude distribution scheme for the initial partition. The basis of this strategy is to use a multilevel approach combined with local refinement algorithms. Three such refinement algorithms are outlined and some example results presented which show that they can produce very high global quality partitions, very rapidly. The results are also compared with a similar multilevel serial partitioner and shown to be almost identical in quality. Finally we consider the impact of the initial partition on the results and demonstrate that the final partition quality is, modulo a certain amount of noise, independent of the initial partition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes work towards the deployment of flexible self-management into real-time embedded systems. A challenging project which focuses specifically on the development of a dynamic, adaptive automotive middleware is described, and the specific self-management requirements of this project are discussed. These requirements have been identified through the refinement of a wide-ranging set of use cases requiring context-sensitive behaviours. A sample of these use-cases is presented to illustrate the extent of the demands for self-management. The strategy that has been adopted to achieve self-management, based on the use of policies is presented. The embedded and real-time nature of the target system brings the constraints that dynamic adaptation capabilities must not require changes to the run-time code (except during hot update of complete binary modules), adaptation decisions must have low latency, and because the target platforms are resource-constrained the self-management mechanism have low resource requirements (especially in terms of processing and memory). Policy-based computing is thus and ideal candidate for achieving the self-management because the policy itself is loaded at run-time and can be replaced or changed in the future in the same way that a data file is loaded. Policies represent a relatively low complexity and low risk means of achieving self-management, with low run-time costs. Policies can be stored internally in ROM (such as default policies) as well as externally to the system. The architecture of a designed-for-purpose powerful yet lightweight policy library is described. A suitable evaluation platform, supporting the whole life-cycle of feasibility analysis, concept evaluation, development, rigorous testing and behavioural validation has been devised and is described.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The diversity gains achievable in the generalised distributed antenna system with cooperative users (GDAS-CU) are considered. A GDAS-CU is comprised of M largely separated access points (APs) at one side of the link, and N geographically closed user terminals (UTs) at the other side. The UTs are collaborating together to enhance the system performance, where an idealised message sharing among the UTs is assumed. First, geometry-based network models are proposed to describe the topology of a GDAS-CU. The mean cross-correlation coefficients of signals received from non-collocated APs and UTs are calculated based on the network topology and the correlation models derived from the empirical data. The analysis is also extendable to more general scenarios where the APs are placed in a clustered form due to the constraints of street layout or building structure. Subsequently, a generalised signal attenuation model derived from several stochastic ray-tracing-based pathloss models is applied to describe the power-decaying pattern in urban built-up areas, where the GDAS-CU may be deployed. Armed with the cross-correlation and pathloss model preliminaries, an intrinsic measure of cooperative diversity obtainable from a GDAS-CU is then derived, which is the number of independent fading channels that can be averaged over to detect symbols. The proposed analytical framework would provide critical insight into the degree of possible performance improvement when combining multiple copies of the received signal in such systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Embedded electronic systems in vehicles are of rapidly increasing commercial importance for the automotive industry. While current vehicular embedded systems are extremely limited and static, a more dynamic configurable system would greatly simplify the integration work and increase quality of vehicular systems. This brings in features like separation of concerns, customised software configuration for individual vehicles, seamless connectivity, and plug-and-play capability. Furthermore, such a system can also contribute to increased dependability and resource optimization due to its inherent ability to adjust itself dynamically to changes in software, hardware resources, and environment condition. This paper describes the architectural approach to achieving the goals of dynamically self-configuring automotive embedded electronic systems by the EU research project DySCAS. The architecture solution outlined in this paper captures the application and operational contexts, expected features, middleware services, functions and behaviours, as well as the basic mechanisms and technologies. The paper also covers the architecture conceptualization by presenting the rationale, concerning the architecture structuring, control principles, and deployment concept. In this paper, we also present the adopted architecture V&V strategy and discuss some open issues in regards to the industrial acceptance.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper describes a methodology for deploying flexible dynamic configuration into embedded systems whilst preserving the reliability advantages of static systems. The methodology is based on the concept of decision points (DP) which are strategically placed to achieve fine-grained distribution of self-management logic to meet application-specific requirements. DP logic can be changed easily, and independently of the host component, enabling self-management behavior to be deferred beyond the point of system deployment. A transparent Dynamic Wrapper mechanism (DW) automatically detects and handles problems arising from the evaluation of self-management logic within each DP and ensures that the dynamic aspects of the system collapse down to statically defined default behavior to ensure safety and correctness despite failures. Dynamic context management contributes to flexibility, and removes the need for design-time binding of context providers and consumers, thus facilitating run-time composition and incremental component upgrade.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Parallel computing is now widely used in numerical simulation, particularly for application codes based on finite difference and finite element methods. A popular and successful technique employed to parallelize such codes onto large distributed memory systems is to partition the mesh into sub-domains that are then allocated to processors. The code then executes in parallel, using the SPMD methodology, with message passing for inter-processor interactions. In order to improve the parallel efficiency of an imbalanced structured mesh CFD code, a new dynamic load balancing (DLB) strategy has been developed in which the processor partition range limits of just one of the partitioned dimensions uses non-coincidental limits, as opposed to coincidental limits. The ‘local’ partition limit change allows greater flexibility in obtaining a balanced load distribution, as the workload increase, or decrease, on a processor is no longer restricted by the ‘global’ (coincidental) limit change. The automatic implementation of this generic DLB strategy within an existing parallel code is presented in this chapter, along with some preliminary results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Code parallelization using OpenMP for shared memory systems is relatively easier than using message passing for distributed memory systems. Despite this, it is still a challenge to use OpenMP to parallelize application codes in a way that yields effective scalable performance when executed on a shared memory parallel system. We describe an environment that will assist the programmer in the various tasks of code parallelization and this is achieved in a greatly reduced time frame and level of skill required. The parallelization environment includes a number of tools that address the main tasks of parallelism detection, OpenMP source code generation, debugging and optimization. These tools include a high quality, fully interprocedural dependence analysis with user interaction capabilities to facilitate the generation of efficient parallel code, an automatic relative debugging tool to identify erroneous user decisions in that interaction and also performance profiling to identify bottlenecks. Finally, experiences of parallelizing some NASA application codes are presented to illustrate some of the benefits of using the evolving environment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Unstructured mesh based codes for the modelling of continuum physics phenomena have evolved to provide the facility to model complex interacting systems. Such codes have the potential to provide a high performance on parallel platforms for a small investment in programming. The critical parameters for success are to minimise changes to the code to allow for maintenance while providing high parallel efficiency, scalability to large numbers of processors and portability to a wide range of platforms. The paradigm of domain decomposition with message passing has for some time been demonstrated to provide a high level of efficiency, scalability and portability across shared and distributed memory systems without the need to re-author the code into a new language. This paper addresses these issues in the parallelisation of a complex three dimensional unstructured mesh Finite Volume multiphysics code and discusses the implications of automating the parallelisation process.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The difficulties encountered in implementing large scale CM codes on multiprocessor systems are now fairly well understood. Despite the claims of shared memory architecture manufacturers to provide effective parallelizing compilers, these have not proved to be adequate for large or complex programs. Significant programmer effort is usually required to achieve reasonable parallel efficiencies on significant numbers of processors. The paradigm of Single Program Multi Data (SPMD) domain decomposition with message passing, where each processor runs the same code on a subdomain of the problem, communicating through exchange of messages, has for some time been demonstrated to provide the required level of efficiency, scalability, and portability across both shared and distributed memory systems, without the need to re-author the code into a new language or even to support differing message passing implementations. Extension of the methods into three dimensions has been enabled through the engineering of PHYSICA, a framework for supporting 3D, unstructured mesh and continuum mechanics modeling. In PHYSICA, six inspectors are used. Part of the challenge for automation of parallelization is being able to prove the equivalence of inspectors so that they can be merged into as few as possible.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The performance of loadsharing algorithms for heterogeneous distributed systems is investigated by simulation. The systems considered are networks of workstations (nodes) which differ in processing power. Two parameters are proposed for characterising system heterogeneity, namely the variance and skew of the distribution of processing power among the network nodes. A variety of networks are investigated, with the same number of nodes and total processing power, but with the processing power distributed differently among the nodes. Two loadsharing algorithms are evaluated, at overall system loadings of 50% and 90%, using job response time as the performance metric. Comparison is made with the ideal situation of ‘perfect sharing’, where it is assumed that the communication delays are zero and that complete knowledge is available about job lengths and the loading at the different nodes, so that an arriving job can be sent to the node where it will be completed in the shortest time. The algorithms studied are based on those already in use for homogeneous networks, but were adapted to take account of system heterogeneity. Both algorithms take into account the differences in the processing powers of the nodes in their location policies, but differ in the extent to which they ‘discriminate’ against the slower nodes. It is seen that the relative performance of the two is strongly influenced by the system utilisation and the distribution of processing power among the nodes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier–Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright © 2000 John Wiley & Sons, Ltd.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Computer Aided Parallelisation Tools (CAPTools) [Ierotheou, C, Johnson SP, Cross M, Leggett PF, Computer aided parallelisation tools (CAPTools)-conceptual overview and performance on the parallelisation of structured mesh codes, Parallel Computing, 1996;22:163±195] is a set of interactive tools aimed to provide automatic parallelisation of serial FORTRAN Computational Mechanics (CM) programs. CAPTools analyses the user's serial code and then through stages of array partitioning, mask and communication calculation, generates parallel SPMD (Single Program Multiple Data) messages passing FORTRAN. The parallel code generated by CAPTools contains calls to a collection of routines that form the CAPTools communications Library (CAPLib). The library provides a portable layer and user friendly abstraction over the underlying parallel environment. CAPLib contains optimised message passing routines for data exchange between parallel processes and other utility routines for parallel execution control, initialisation and debugging. By compiling and linking with different implementations of the library, the user is able to run on many different parallel environments. Even with today's parallel systems the concept of a single version of a parallel application code is more of an aspiration than a reality. However for CM codes the data partitioning SPMD paradigm requires a relatively small set of message-passing communication calls. This set can be implemented as an intermediate `thin layer' library of message-passing calls that enables the parallel code (especially that generated automatically by a parallelisation tool such as CAPTools) to be as generic as possible. CAPLib is just such a `thin layer' message passing library that supports parallel CM codes, by mapping generic calls onto machine specific libraries (such as CRAY SHMEM) and portable general purpose libraries (such as PVM an MPI). This paper describe CAPLib together with its three perceived advantages over other routes: - as a high level abstraction, it is both easy to understand (especially when generated automatically by tools) and to implement by hand, for the CM community (who are not generally parallel computing specialists); - the one parallel version of the application code is truly generic and portable; - the parallel application can readily utilise whatever message passing libraries on a given machine yield optimum performance.