77 resultados para Parallel execution


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we have demonstrated how the existing programming environments, tools and middleware could be used for the study of execution performance of parallel and sequential applications on a non-dedicated cluster. A set of parallel and sequential benchmark applications selected for and used in the experiments were characterized, and experiment requirements shown. 

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent research efforts of parallel processing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However, as a collection of independent computers used by multiple users, clusters are susceptible to failure. This paper shows the development of a coordinated checkpointing facility for the GENESIS cluster operating system. This facility was developed by exploiting existing operating system services. High performance and low overheads are achieved by allowing the processes of a parallel application to continue executing during the creation of checkpoints, while maintaining low demands on cluster resources by using coordinated checkpointing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although individual PCs of a cluster are used by their owners to run sequential applications (local jobs), the cluster as a whole or its subset can also be employed to run parallel applications (cluster jobs) even during working hours. This implies that these computers have to be shared by parallel and sequential applications, which could lead to the improvement of the execution performance and resource utilization. However, there is a lack of experimental study showing the behavior and performance of executing parallel and sequential applications concurrently on a non-dedicated cluster. The result of such research would be beneficial for the development of new global scheduling algorithms. We present the result of an experimental study into scheduling of a mixture of parallel and sequential applications on a non-dedicated cluster. The aim of this study is to learn how the concurrent execution of a communication intensive parallel application and sequential applications influences their execution performance and utilization of the cluster.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Studies have shown that most of the computers in a non-dedicated cluster are often idle or lightly loaded. The underutilized computers in a non-dedicated cluster can be employed to execute parallel applications. The aim of this study is to learn how concurrent execution of a computation-bound and sequential applications influence their execution performance and cluster utilization. The result of the study has demonstrated that a computation-bound parallel application benefits from load balancing, and at the same time sequential applications suffer only an insignificant slowdown of execution. Overall, the utilization of a non-dedicated cluster is improved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We assert that companies can make more money and research institutions can improve their performance if inexpensive clusters and enterprise grids are exploited. In this paper, we have demonstrated that our claim is valid by showing the study of how programming environments, tools and middleware could be used for the execution of parallel and sequential applications, multiple parallel applications executing simultaneously on a non-dedicated cluster, and parallel applications on an enterprise grid and that the execution performance was improved. For this purpose an execution environment, and parallel and sequential benchmark applications selected for, and used in, the experiments were characterised.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cloud computing is the most recent realisation of computing as a utility. Recently, fields with substantial computational requirements, e.g., biology, are turning to clouds for cheap, on-demand provisioning of resources. Of interest to this paper is the execution of compute intensive applications on hybrid clouds. If application requirements exceed private cloud resource capacity, clients require scaling down their applications. The outcome of this research is Web technology realising a new form of cloud called HPC Hybrid Deakin (H2D) Cloud -- an experimental hybrid cloud capable of utilising both local and remote computational services for single large embarrassingly parallel applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, fields with substantial computing requirementshave turned to cloud computing for economical, scalable, and on-demandprovisioning of required execution environments. However, current cloudofferings focus on providing individual servers while tasks such as applicationdistribution and data preparation are left to cloud users. This article presents anew form of cloud called HPC Hybrid Deakin (H2D) cloud; an experimentalhybrid cloud capable of utilising both local and remote computational servicesfor large embarrassingly parallel applications. As well as supporting execution,H2D also provides a new service, called DataVault, that provides transparentdata management services so all cloud-hosted clusters have required datasetsbefore commencing execution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Temporal violations often take place during the running of large batch of parallel business cloud workflow, which have a serious impact on the on-time completion of massive concurrent user requests. Existing studies have shown that local temporal violations (namely the delays of workflow activities) occurring during cloud workflow execution are the fundamental causes for failed on-time completion. Therefore, accurate prediction of temporal violations is a very important yet challenging task for business cloud workflows. In this paper, based on an epidemic model, a novel temporal violation prediction strategy is proposed to estimate the number of local temporal violations and the number of violations that must be handled so as to achieve a certain on-time completion rate before the execution of workflows. The prediction result can be served as an important reference for temporal violation prevention and handling strategies such as static resource reservation and dynamic provision. Specifically, we first analyze the queuing process of the parallel workflow activities, then we predict the number of potential temporal violations based on a novel temporal violation transmission model inspired by an epidemic model. Comprehensive experimental results demonstrate that our strategy can achieve very high prediction accuracy under different situations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Present operating systems are not built to support parallel computing––they do not provide services to manage parallelism, i.e., to globally manage parallel processes and computational resources. The cluster operating environments that are used to assist the execution of parallel applications do not provide support for both programming paradigms, message passing (MP) or distributed shared memory (DSM)––they are mainly offered as separate components implemented at the user level as library and independent server processes. Due to poor operating systems users must deal with clusters as a set of independent computers rather than to see this cluster as a single powerful computer. A single system image (SSI) of the cluster is not offered to users. There is a need for an operating system for clusters. We claim and demonstrate in this paper that it is possible to develop a cluster operating system that is able to efficiently manage parallelism; use cluster resources efficiently; support MP in the form of standard MP and PVM, and DSM; offer SSI; and make it easy to use. We show that to achieve these aims this operating system should inherit many features of a distributed operating system and provide new services which address the needs of parallel processes, cluster's resources, and application developers. In order to substantiate the claim the first version of a cluster operating system managing parallelism and offering SSI, called GENESIS, has been developed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent research efforts of parallel processing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However, as a collection of independent computers used by multiple users, clusters are susceptible to failure. This paper shows the development of a coordinated checkpointing facility for the GENESIS cluster operating system. This facility was developed by exploiting existing operating system services. High performance and low overheads are achieved by allowing the processes of a parallel application to continue executing during the creation of checkpoints, while maintaining low demands on cluster resources by using coordinated checkpointing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Job scheduling is a complex problem, yet it is fundamental to sustaining and improving the performance of parallel processing systems. In this paper, we address an on-line parallel job scheduling problem in heterogeneous multi-cluster computing systems. We propose a new space-sharing scheduling policy and show that it performs substantially better than the conventional policies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent trends in grid computing development is moving towards a service-oriented architecture. With the momentum gaining for the service-oriented grid computing systems, the issue of deploying support for integrated scheduling and fault-tolerant approaches becomes paramount importance. To this end, we propose a scalable framework that loosely couples the dynamic job scheduling approach with the hybrid replications approach to schedule jobs efficiently while at the same time providing fault-tolerance. The novelty of the proposed framework is that it uses passive replication approach under high system load and active replication approach under low system loads. The switch between these two replication methods is also done dynamically and transparently.