980 resultados para parallel applications


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Computational grids with multiple batch systems (batch grids) can be powerful infrastructures for executing long-running multicomponent parallel applications. In this paper, we have constructed a middleware framework for executing such long-running applications spanning multiple submissions to the queues on multiple batch systems. We have used our framework for execution of a foremost long-running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our framework coordinates the distribution, execution, migration and restart of the components of CCSM on the multiple queues where the component jobs of the different queues can have different queue waiting and startup times.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Long running multi-physics coupled parallel applications have gained prominence in recent years. The high computational requirements and long durations of simulations of these applications necessitate the use of multiple systems of a Grid for execution. In this paper, we have built an adaptive middleware framework for execution of long running multi-physics coupled applications across multiple batch systems of a Grid. Our framework, apart from coordinating the executions of the component jobs of an application on different batch systems, also automatically resubmits the jobs multiple times to the batch queues to continue and sustain long running executions. As the set of active batch systems available for execution changes, our framework performs migration and rescheduling of components using a robust rescheduling decision algorithm. We have used our framework for improving the application throughput of a foremost long running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our real multi-site experiments with CCSM indicate that Grid executions can lead to improved application throughput for climate models.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Computational grids with multiple batch systems (batch grids) can be powerful infrastructures for executing long-running multi-component parallel applications. In this paper, we evaluate the potential improvements in throughput of long-running multi-component applications when the different components of the applications are executed on multiple batch systems of batch grids. We compare the multiple batch executions with executions of the components on a single batch system without increasing the number of processors used for executions. We perform our analysis with a foremost long-running multi-component application for climate modeling, the Community Climate System Model (CCSM). We have built a robust simulator that models the characteristics of both the multi-component application and the batch systems. By conducting large number of simulations with different workload characteristics and queuing policies of the systems, processor allocations to components of the application, distributions of the components to the batch systems and inter-cluster bandwidths, we show that multiple batch executions lead to 55% average increase in throughput over single batch executions for long-running CCSM. We also conducted real experiments with a practical middleware infrastructure and showed that multi-site executions lead to effective utilization of batch systems for executions of CCSM and give higher simulation throughput than single-site executions. Copyright (c) 2011 John Wiley & Sons, Ltd.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

For communication-intensive parallel applications, the maximum degree of concurrency achievable is limited by the communication throughput made available by the network. In previous work [HPS94], we showed experimentally that the performance of certain parallel applications running on a workstation network can be improved significantly if a congestion control protocol is used to enhance network performance. In this paper, we characterize and analyze the communication requirements of a large class of supercomputing applications that fall under the category of fixed-point problems, amenable to solution by parallel iterative methods. This results in a set of interface and architectural features sufficient for the efficient implementation of the applications over a large-scale distributed system. In particular, we propose a direct link between the application and network layer, supporting congestion control actions at both ends. This in turn enhances the system's responsiveness to network congestion, improving performance. Measurements are given showing the efficacy of our scheme to support large-scale parallel computations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The management of non-functional features (performance, security, power management, etc.) is traditionally a difficult, error prone task for programmers of parallel applications. To take care of these non-functional features, autonomic managers running policies represented as rules using sensors and actuators to monitor and transform a running parallel application may be used. We discuss an approach aimed at providing formal tool support to the integration of independently developed autonomic managers taking care of different non-functional concerns within the same parallel application. Our approach builds on the Behavioural Skeleton experience (autonomic management of non-functional features in structured parallel applications) and on previous results on conflict detection and resolution in rule-based systems. © 2013 Springer-Verlag Berlin Heidelberg.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

High-level parallel languages offer a simple way for application programmers to specify parallelism in a form that easily scales with problem size, leaving the scheduling of the tasks onto processors to be performed at runtime. Therefore, if the underlying system cannot efficiently execute those applications on the available cores, the benefits will be lost. In this paper, we consider how to schedule highly heterogenous parallel applications that require real-time performance guarantees on multicore processors. The paper proposes a novel scheduling approach that combines the global Earliest Deadline First (EDF) scheduler with a priority-aware work-stealing load balancing scheme, which enables parallel realtime tasks to be executed on more than one processor at a given time instant. Experimental results demonstrate the better scalability and lower scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The recent trends of chip architectures with higher number of heterogeneous cores, and non-uniform memory/non-coherent caches, brings renewed attention to the use of Software Transactional Memory (STM) as a fundamental building block for developing parallel applications. Nevertheless, although STM promises to ease concurrent and parallel software development, it relies on the possibility of aborting conflicting transactions to maintain data consistency, which impacts on the responsiveness and timing guarantees required by embedded real-time systems. In these systems, contention delays must be (efficiently) limited so that the response times of tasks executing transactions are upper-bounded and task sets can be feasibly scheduled. In this paper we assess the use of STM in the development of embedded real-time software, defending that the amount of contention can be reduced if read-only transactions access recent consistent data snapshots, progressing in a wait-free manner. We show how the required number of versions of a shared object can be calculated for a set of tasks. We also outline an algorithm to manage conflicts between update transactions that prevents starvation.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The single factor limiting the harnessing of the enormous computing power of clusters for parallel computing is the lack of appropriate software. Present cluster operating systems are not built to support parallel computing – they do not provide services to manage parallelism. The cluster operating environments that are used to assist the execution of parallel applications do not provide support for both Message Passing (MP) or Distributed Shared Memory (DSM) paradigms. They are only offered as separate components implemented at the user level as library and independent servers. Due to poor operating systems users must deal with computers of a cluster rather than to see this cluster as a single powerful computer. A Single System Image of the cluster is not offered to users. There is a need for an operating system for clusters. We claim and demonstrate that it is possible to develop a cluster operating system that is
able to efficiently manage parallelism, support Message Passing and DSM and offer the Single System Image. In order to substantiate the claim the first version of a cluster operating system, called GENESIS, that manages parallelism and offers the Single System Image has been developed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Parallel execution is a very efficient means of processing vast amounts of data in a small amount of time. Creating parallel applications has never been easy, and requires much knowledge of the task and the execution environment used to execute parallel processes. The process of creating parallel applications can be made easier through using a compiler that automatically parallelises a supplied application. Executing the parallel application is also simplified when a well designed execution environment is used. Such an execution environment provides very powerful operations to the programmer transparently. Combining both a parallelising compiler and execution environment and providing a fully automated parallelisation and execution tool is the aim of this research. The advantage of using such a fully automated tool is that the user does not need to provide any additional input to gain the benefits of parallel execution. This report shows the tool and how it transparently supports the programmer creating parallel applications and supports their execution.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Currently, coordinated scheduling of multiple parallel applications across computers has been considered as the critical factor to achieve high execution performance. We claim in this report that the performance and costs of the execution of parallel applications could be improved if not only dedicated clusters but also non-dedicated clusters were used and several parallel applications were executed concurreontly. To support this claim we carried out experimental study into the performance of multiple NAS parallel programs executing concurrently on a non-dedicated cluster.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Computers of a non-dedicated cluster are often idle (users attend meetings, have lunch or coffee breaks) or lightly loaded (users carry out simple computations). These underutilized computers can be employed to execute parallel applications not only during weekends and at nights but also during office hours. Thus, they have to be shared by parallel and sequential applications which could lead to the improvement of their execution performance. However, there is a lack of experimental study showing the behavior and performance of parallel and sequential applications executing concurrently on clusters. We present here the result of an experimental study into load balancing based scheduling of a mixture of parallel and sequential applications on a non-dedicated cluster.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Although individual PCs of a cluster are used by their owners to run sequential applications (local jobs), the cluster as a whole or its subset can also be employed to run parallel applications (cluster jobs) even during working hours. This implies that these computers have to be shared by parallel and sequential applications, which could lead to the improvement of the execution performance and resource utilization. However, there is a lack of experimental study showing the behavior and performance of executing parallel and sequential applications concurrently on a non-dedicated cluster. The result of such research would be beneficial for the development of new global scheduling algorithms. We present the result of an experimental study into scheduling of a mixture of parallel and sequential applications on a non-dedicated cluster. The aim of this study is to learn how the concurrent execution of a communication intensive parallel application and sequential applications influences their execution performance and utilization of the cluster.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dedicated clusters are becoming commonly used for high performance parallel processing. Computers of a non-dedicated cluster are often idle or lightly loaded. These under utilised computers can be employed to execute parallel applications. Thus, they have to be shared by parallel and sequential applications, which could lead to the improvement of their execution performance. There is a lack of experimental study showing the behaviour and performance of executing parallel and sequential applications concurrently on a non-dedicated cluster. We present the result of an experimental study into load balancing of a mixture of parallel and sequential applications on a non-dedicated cluster.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Current attempts to manage parallel applications on Clusters of Workstations (COWs) have either generally followed the parallel execution environment approach or been extensions to existing network operating systems, both of which do not provide complete or satisfactory solutions. The efficient and transparent management of parallelism within the COW environment requires enhanced methods of process instantiation, mapping of parallel process to workstations, maintenance of process relationships, process communication facilities, and process coordination mechanisms. The aim of this research is to synthesise, design, develop and experimentally study a system capable of efficiently and transparently managing SPMD parallelism on a COW. This system should both improve the performance of SPMD based parallel programs and relieve the programmer from the involvement into parallelism management in order to allow them to concentrate on application programming. It is also the aim of this research to show that such a system, to achieve these objectives, is best achieved by adding new special services and exploiting the existing services of a client/server and microkernel based distributed operating system. To achieve these goals the research methods of the experimental computer science should be employed. In order to specify the scope of this project, this work investigated the issues related to parallel processing on COWs and surveyed a number of relevant systems including PVM, NOW and MOSIX. It was shown that although the MOSIX system provide a number of good services related to parallelism management, none of the system forms a complete solution. The problems identified with these systems include: instantiation services that are not suited to parallel processing; duplication of services between the parallelism management environment and the operating system; and poor levels of transparency. A high performance and transparent system capable of managing the execution of SPMD parallel applications was synthesised and the specific services of process instantiation, process mapping and process interaction detailed. The process instantiation service designed here provides the capability to instantiate parallel processes using either creation or duplication methods and also supports multiple and group based instantiation which is specifically design for SPMD parallel processing. The process mapping service provides the combination of process allocation and dynamic load balancing to ensure the load of a COW remains balanced not only at the time a parallel program is initialised but also during the execution of the program. The process interaction service guarantees to maintain transparently process relationships, communications and coordination services between parallel processes regardless of their location within the COW. The combination of these services provides an original architecture and organisation of a system that is capable of fully managing the execution of SPMD parallel applications on a COW. A logical design of a parallelism management system was developed derived from the synthesised system and was shown that it should ideally be based on a distributed operating system employing the client server model. The client/server based distributed operating system provides the level of transparency, modularity and flexibility necessary for a complete parallelism management system. The services identified in the synthesised system have been mapped to a set of server processes including: Process Instantiation Server providing advanced multiple and group based process creation and duplication; Process Mapping Server combining load collection, process allocation and dynamic load balancing services; and Process Interaction Server providing transparent interprocess communication and coordination. A Process Migration Server was also identified as vital to support both the instantiation and mapping servers. The RHODOS client/server and microkernel based distributed operating system was selected to carry out research into the detailed design and to be used for the implementation this parallelism management system. RHODOS was enhanced to provide the required servers and resulted in the development of the REX Manager, Global Scheduler and Process Migration Manager to provide the services of process instantiation, mapping and migration, respectively. The process interaction services were already provided within RHODOS and only required some extensions to the existing Process Manager and IPC Managers. Through a variety of experiments it was shown that when this system was used to support the execution of SPMD parallel applications the overall execution times were improved, especially when multiple and group based instantiation services are employed. The RHODOS PMS was also shown to greatly reduce the programming burden experienced by users when writing SPMD parallel applications by providing a small set of powerful primitives specially designed to support parallel processing. The system was also shown to be applicable and has been used in a variety of other research areas such as Distributed Shared Memory, Parallelising Compilers and assisting the port of PVM to the RHODOS system. The RHODOS Parallelism Management System (PMS) provides a unique and creative solution to the problem of transparently and efficiently controlling the execution of SPMD parallel applications on COWs. Combining advanced services such as multiple and group based process creation and duplication; combined process allocation and dynamic load balancing; and complete COW wide transparency produces a totally new system that addresses many of the problems not addressed in other systems.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Currently, coordinated scheduling of multiple parallel applications across computers has been considered as the critical factor to achieve high execution performance. We claim in this report that the performance and costs of the execution of parallel applications could be improved if not only dedicated clusters but also non-dedicated clusters were used and several parallel applications were executed concurreontly. To support this claim we carried out experimental study into the performance of multiple NAS parallel programs executing concurrently on a non-dedicated cluster.