998 resultados para parallel programs


Relevância:

100.00% 100.00%

Publicador:

Resumo:

With proliferation of chip multicores (CMPs) on desktops and embedded platforms, multi-threaded programs have become ubiquitous. Existence of multiple threads may cause resource contention, such as, in on-chip shared cache and interconnects, depending upon how they access resources. Hence, we propose a tool - Thread Contention Predictor (TCP) to help quantify the number of threads sharing data and their sharing pattern. We demonstrate its use to predict a more profitable shared, last level on-chip cache (LLC) access policy on CMPs. Our cache configuration predictor is 2.2 times faster compared to the cycle-accurate simulations. We also demonstrate its use for identifying hot data structures in a program which may cause performance degradation due to false data sharing. We fix layout of such data structures and show up-to 10% and 18% improvement in execution time and energy-delay product (EDP), respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Predictability -- the ability to foretell that an implementation will not violate a set of specified reliability and timeliness requirements -- is a crucial, highly desirable property of responsive embedded systems. This paper overviews a development methodology for responsive systems, which enhances predictability by eliminating potential hazards resulting from physically-unsound specifications. The backbone of our methodology is the Time-constrained Reactive Automaton (TRA) formalism, which adopts a fundamental notion of space and time that restricts expressiveness in a way that allows the specification of only reactive, spontaneous, and causal computation. Using the TRA model, unrealistic systems – possessing properties such as clairvoyance, caprice, infinite capacity, or perfect timing -- cannot even be specified. We argue that this "ounce of prevention" at the specification level is likely to spare a lot of time and energy in the development cycle of responsive systems -- not to mention the elimination of potential hazards that would have gone, otherwise, unnoticed. The TRA model is presented to system developers through the Cleopatra programming language. Cleopatra features a C-like imperative syntax for the description of computation, which makes it easier to incorporate in applications already using C. It is event-driven, and thus appropriate for embedded process control applications. It is object-oriented and compositional, thus advocating modularity and reusability. Cleopatra is semantically sound; its objects can be transformed, mechanically and unambiguously, into formal TRA automata for verification purposes, which can be pursued using model-checking or theorem proving techniques. Since 1989, an ancestor of Cleopatra has been in use as a specification and simulation language for embedded time-critical robotic processes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most parallel computing applications in highperformance computing use the Message Passing Interface (MPI) API. Given the fundamental importance of parallel computing to science and engineering research, application correctness is paramount. MPI was originally developed around 1993 by the MPI Forum, a group of vendors, parallel programming researchers, and computational scientists. However, the document defining the standard is not issued by an official standards organization but has become a de facto standard © 2011 ACM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The inherent difficulty of thread-based shared-memory programming has recently motivated research in high-level, task-parallel programming models. Recent advances of Task-Parallel models add implicit synchronization, where the system automatically detects and satisfies data dependencies among spawned tasks. However, dynamic dependence analysis incurs significant runtime overheads, because the runtime must track task resources and use this information to schedule tasks while avoiding conflicts and races.
We present SCOOP, a compiler that effectively integrates static and dynamic analysis in code generation. SCOOP combines context-sensitive points-to, control-flow, escape, and effect analyses to remove redundant dependence checks at runtime. Our static analysis can work in combination with existing dynamic analyses and task-parallel runtimes that use annotations to specify tasks and their memory footprints. We use our static dependence analysis to detect non-conflicting tasks and an existing dynamic analysis to handle the remaining dependencies. We evaluate the resulting hybrid dependence analysis on a set of task-parallel programs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Refactoring is the process of changing the structure of a program without changing its behaviour. Refactoring has so far only really been deployed effectively for sequential programs. However, with the increased availability of multicore (and, soon, manycore) systems, refactoring can play an important role in helping both expert and non-expert parallel programmers structure and implement their parallel programs. This paper describes the design of a new refactoring tool that is aimed at increasing the programmability of parallel systems. To motivate our design, we refactor a number of examples in C, C++ and Erlang into good parallel implementations, using a set of formal pattern rewrite rules. © 2013 Springer-Verlag Berlin Heidelberg.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present TProf, an energy profiling tool for OpenMP-like task-parallel programs. To compute the energy consumed by each task in a parallel application, TProf dynamically traces the parallel execution and uses a novel technique to estimate the per-task energy consumption. To achieve this estimation, TProf apportions the total processor energy among cores and overcomes the limitation of current works which would otherwise make parallel accounting impossible to achieve. We demonstrate the value of TProf by characterizing a set of task parallel programs, where we find that data locality, memory access patterns and task working sets are responsible for significant variance in energy consumption between seemingly homogeneous tasks. In addition, we identify opportunities for fine-grain energy optimization by applying per-task Dynamic Voltage and Frequency Scaling (DVFS).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Currently, coordinated scheduling of multiple parallel applications across computers has been considered as the critical factor to achieve high execution performance. We claim in this report that the performance and costs of the execution of parallel applications could be improved if not only dedicated clusters but also non-dedicated clusters were used and several parallel applications were executed concurreontly. To support this claim we carried out experimental study into the performance of multiple NAS parallel programs executing concurrently on a non-dedicated cluster.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Currently, coordinated scheduling of multiple parallel applications across computers has been considered as the critical factor to achieve high execution performance. We claim in this report that the performance and costs of the execution of parallel applications could be improved if not only dedicated clusters but also non-dedicated clusters were used and several parallel applications were executed concurreontly. To support this claim we carried out experimental study into the performance of multiple NAS parallel programs executing concurrently on a non-dedicated cluster.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a new programming methodology for introducing and tuning parallelism in Erlang programs, using source-level code refactoring from sequential source programs to parallel programs written using our skeleton library, Skel. High-level cost models allow us to predict with reasonable accuracy the parallel performance of the refactored program, enabling programmers to make informed decisions about which refactorings to apply. Using our approach, we demonstrate easily obtainable, significant and scalable speedups of up to 21 on a 24-core machine over the sequential code.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Symmetric multi-processor (SMP) systems, or multiple-CPU servers, are suitable for implementing parallel algorithms because they employ dedicated communication devices to enhance the inter-processor communication bandwidth, so that a better performance can be obtained. However, the cost for a multiple-CPU server is high and therefore, the server is usually shared among many users. The work-load due to other users will certainly affect the performance of the parallel programs so it is desirable to derive a method to optimize parallel programs under different loading conditions. In this paper, we present a simple method, which can be applied in SPMD type parallel programs, to improve the speedup by controlling the number of threads within the programs.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Current attempts to manage parallel applications on Clusters of Workstations (COWs) have either generally followed the parallel execution environment approach or been extensions to existing network operating systems, both of which do not provide complete or satisfactory solutions. The efficient and transparent management of parallelism within the COW environment requires enhanced methods of process instantiation, mapping of parallel process to workstations, maintenance of process relationships, process communication facilities, and process coordination mechanisms. The aim of this research is to synthesise, design, develop and experimentally study a system capable of efficiently and transparently managing SPMD parallelism on a COW. This system should both improve the performance of SPMD based parallel programs and relieve the programmer from the involvement into parallelism management in order to allow them to concentrate on application programming. It is also the aim of this research to show that such a system, to achieve these objectives, is best achieved by adding new special services and exploiting the existing services of a client/server and microkernel based distributed operating system. To achieve these goals the research methods of the experimental computer science should be employed. In order to specify the scope of this project, this work investigated the issues related to parallel processing on COWs and surveyed a number of relevant systems including PVM, NOW and MOSIX. It was shown that although the MOSIX system provide a number of good services related to parallelism management, none of the system forms a complete solution. The problems identified with these systems include: instantiation services that are not suited to parallel processing; duplication of services between the parallelism management environment and the operating system; and poor levels of transparency. A high performance and transparent system capable of managing the execution of SPMD parallel applications was synthesised and the specific services of process instantiation, process mapping and process interaction detailed. The process instantiation service designed here provides the capability to instantiate parallel processes using either creation or duplication methods and also supports multiple and group based instantiation which is specifically design for SPMD parallel processing. The process mapping service provides the combination of process allocation and dynamic load balancing to ensure the load of a COW remains balanced not only at the time a parallel program is initialised but also during the execution of the program. The process interaction service guarantees to maintain transparently process relationships, communications and coordination services between parallel processes regardless of their location within the COW. The combination of these services provides an original architecture and organisation of a system that is capable of fully managing the execution of SPMD parallel applications on a COW. A logical design of a parallelism management system was developed derived from the synthesised system and was shown that it should ideally be based on a distributed operating system employing the client server model. The client/server based distributed operating system provides the level of transparency, modularity and flexibility necessary for a complete parallelism management system. The services identified in the synthesised system have been mapped to a set of server processes including: Process Instantiation Server providing advanced multiple and group based process creation and duplication; Process Mapping Server combining load collection, process allocation and dynamic load balancing services; and Process Interaction Server providing transparent interprocess communication and coordination. A Process Migration Server was also identified as vital to support both the instantiation and mapping servers. The RHODOS client/server and microkernel based distributed operating system was selected to carry out research into the detailed design and to be used for the implementation this parallelism management system. RHODOS was enhanced to provide the required servers and resulted in the development of the REX Manager, Global Scheduler and Process Migration Manager to provide the services of process instantiation, mapping and migration, respectively. The process interaction services were already provided within RHODOS and only required some extensions to the existing Process Manager and IPC Managers. Through a variety of experiments it was shown that when this system was used to support the execution of SPMD parallel applications the overall execution times were improved, especially when multiple and group based instantiation services are employed. The RHODOS PMS was also shown to greatly reduce the programming burden experienced by users when writing SPMD parallel applications by providing a small set of powerful primitives specially designed to support parallel processing. The system was also shown to be applicable and has been used in a variety of other research areas such as Distributed Shared Memory, Parallelising Compilers and assisting the port of PVM to the RHODOS system. The RHODOS Parallelism Management System (PMS) provides a unique and creative solution to the problem of transparently and efficiently controlling the execution of SPMD parallel applications on COWs. Combining advanced services such as multiple and group based process creation and duplication; combined process allocation and dynamic load balancing; and complete COW wide transparency produces a totally new system that addresses many of the problems not addressed in other systems.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This research investigated an automated approach to re-writing traditional sequential computer programs into parallel programs for networked computers. A tool was designed and developed for generating parallel programs automatically and also executing these parallel programs on a network of computers. Performance is maximized by utilising all idle resources.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This work shows the design, simulation, and analysis of two optical interconnection networks for a Dataflow parallel computer architecture. To verify the optical interconnection network performance on the Dataflow architecture, we have analyzed the load balancing among the processors during the parallel programs executions. The load balancing is a very important parameter because it is directly associated to the dataflow parallelism degree. This article proves that optical interconnection networks designed with simple optical devices can provide efficiently the dataflow requirements of a high performance communication system.