32 resultados para cluster computing

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Job scheduling is a complex problem, yet it is fundamental to sustaining and improving the performance of parallel processing systems. In this paper, we address an on-line parallel job scheduling problem in heterogeneous multi-cluster computing systems. We propose a new space-sharing scheduling policy and show that it performs substantially better than the conventional policies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The widespread adoption of cluster computing as a high performance computing platform has seen the growth of data intensive scientific, engineering and commercial applications such as digital libraries, climate modeling, computational chemistry, computational fluid dynamics and image repositories. However, I/O subsystem performance has not been keeping pace with processor and memory performance, and is fast becoming the dominant factor in overall system performance.  Thus, parallel I/O has become a necessity in the face of performance improvements in other areas of computing systems. This paper addresses the problem of parallel I/O scheduling on cluster computing systems in the presence of data replication.  We propose two new I/O scheduling algorithms and evaluate the relative performance of the proposed policies against two existing approaches.  Simulation results show that the proposed policies perform substantially better than the baseline policies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The overall performance of a distributed system is often depends on the effectiveness of its interconnection network. Thus, the study of the communication networks for distributed systems is very important, which is the focus of this paper. In particular, we address the problem of fat-tree based interconnection networks performance modeling for multi-user heterogeneous multi-cluster computing systems. To this end, we present an analytical model and validate the model through comprehensive simulation. The results of the simulation demonstrated that the proposed model exhibits a good degree of accuracy for various system organizations and under different working conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cluster computation has been used in the applications that demand performance, reliability, and availability, such as cluster server groups, large-scale scientific computations, distributed databases, distributed media-on-demand servers and search engines etc. In those applications, multicast can play the vital roles for the information dissemination among groups of servers and users. This paper proposes a set of novel efficient fault-tolerant multicast routing algorithms on hypercube interconnection of cluster computers using multicast shared tree approach. We present some new algorithms for selecting an optimal core (root) and constructing the shared tree so as to minimize the average delay for multicast messages. Simulation results indicate that our algorithms are efficient in the senses of short end-to-end average delay, load balance and less resource utilizations over hypercube cluster interconnection networks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the problem of interconnection networks performance modeling of large-scale distributed systems with emphases on multi-cluster computing systems. The study of interconnection networks is important because the overall performance of a distributed system is often critically hinged on the effectiveness of its interconnection network. We present an analytical model that considers stochastic quantities as well as processor heterogeneity of the target system. The model is validated through comprehensive simulation, which demonstrates that the proposed model exhibits a good degree of accuracy for various system sizes and under different operating conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the problem of performance modeling of heterogeneous multi-cluster computing systems. We present an analytical model that can be employed to explore the effectiveness of different design approaches so that one can have an intelligent choice during design and evaluation of a cost effective large-scale heterogeneous distributed computing system. The proposed model considers stochastic quantities as well as processor heterogeneity of the target system. The analysis is based on a parametric fat-tree network, the m-port n-tree, and a deterministic routing algorithm. The correctness of the proposed model is validated through comprehensive simulation of different types of clusters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The overall performance of a distributed system often depends on the effectiveness of its interconnection network. Thus, the study of the communication networks for distributed systems–which is the focus of this paper–is very important. In particular, we address the problem of fat-tree based interconnection networks performance modeling for multi-user heterogeneous multi-cluster computing systems. To this end, we present an analytical model and validate the model through comprehensive simulation. The results of the simulation demonstrate that the proposed model exhibits a good degree of accuracy for various system organizations and under different working conditions. On the basis of the validated model, we propose an adaptive assignment function based on the existing heterogeneity of the system to minimize multi-user environment overhead.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Recent research efforts of parallel processing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However, as a collection of independent computers used by multiple users, clusters are susceptible to failure. This paper shows the development of a coordinated checkpointing facility for the GENESIS cluster operating system. This facility was developed by exploiting existing operating system services. High performance and low overheads are achieved by allowing the processes of a parallel application to continue executing during the creation of checkpoints, while maintaining low demands on cluster resources by using coordinated checkpointing.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

With the current popularity of cluster computing systems, it is increasingly important to understand the capabilities and potential performance of various interconnection networks. In this paper, we propose an analytical model for studying the capabilities and potential performance of interconnection networks for multi-cluster systems. The model takes into account stochastic quantities as well as network heterogeneity in bandwidth and latency in each cluster. Also, blocking and non-blocking network architecture model is proposed and are used in performance analysis of the system. The model is validated by constructing a set of simulators to simulate different types of clusters, and by comparing the modeled results with the simulated ones.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper addresses the problem of performance modeling for large-scale heterogeneous distributed systems with emphases on multi-cluster computing systems. Since the overall performance of distributed systems is often depends on the effectiveness of its communication network, the study of the interconnection networks for these systems is very important. Performance modeling is required to avoid poorly chosen components and architectures as well as discovering a serious shortfall during system testing just prior to deployment time. However, the multiplicity of components and associated complexity make performance analysis of distributed computing systems a challenging task. To this end, we present an analytical performance model for the interconnection networks of heterogeneous multi-cluster systems. The analysis is based on a parametric family of fat-trees, the m-port n-tree, and a deterministic routing algorithm, which is proposed in this paper. The model is validated through comprehensive simulation, which demonstrated that the proposed model exhibits a good degree of accuracy for various system organizations and under different working conditions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The study of interconnection networks is important because the overall performance of a distributed system is often critically hinged on the effectiveness of its interconnection network. In the mean time, the heterogeneity is one of the most important factors of such systems. This paper addresses the problem of interconnection networks performance modeling of large-scale distributed systems with emphases on heterogeneous multi-cluster computing systems. So, we present an analytical model to predict message latency in multi-cluster systems in the presence of cluster size heterogeneity. The model is validated through comprehensive simulation, which demonstrates that the proposed model exhibits a good degree of accuracy for various system organizations and under different working conditions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The advent of commodity-based high-performance clusters has raised parallel and distributed computing to a new level. However, in order to achieve the best possible performance improvements for large-scale computing problems as well as good resource utilization, efficient resource management and scheduling is required. This paper proposes a new two-level adaptive space-sharing scheduling policy for non-dedicated heterogeneous commodity-based high-performance clusters. Using trace-driven simulation, the performance of the proposed scheduling policy is compared with existing adaptive space-sharing policies. Results of the simulation show that the proposed policy performs substantially better than the existing policies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Parallel execution is a very efficient means of processing vast amounts of data in a small amount of time. Creating parallel applications has never been easy, and requires much knowledge of the task and the execution environment used to execute parallel processes. The process of creating parallel applications can be made easier through using a compiler that automatically parallelises a supplied application. Executing the parallel application is also simplified when a well designed execution environment is used. Such an execution environment provides very powerful operations to the programmer transparently. Combining both a parallelising compiler and execution environment and providing a fully automated parallelisation and execution tool is the aim of this research. The advantage of using such a fully automated tool is that the user does not need to provide any additional input to gain the benefits of parallel execution. This report shows the tool and how it transparently supports the programmer creating parallel applications and supports their execution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cluster computing has come to prominence as a cost-effective parallel processing tool for solving many complex computational problems. In this paper, we propose a new timesharing opportunistic scheduling policy to support remote batch job executions over networked clusters to be used in conjunction with the Condor Up-Down scheduling algorithm. We show that timesharing approaches can be used in an opportunistic setting to improve both mean job slowdowns and mean response times with little or no throughput reduction. We also show that the proposed algorithm achieves significant improvement in job response time and slowdown as compared to exiting approaches and some recently proposed new approaches.