995 resultados para Parallel execution


Relevância:

30.00% 30.00%

Publicador:

Resumo:

A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript ∞]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript ∞]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript ∞]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript ∞]), but the work is increased by a factor of O(lg n).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

How can a bridge be built between autonomic computing approaches and parallel computing system? The work reported in this paper is motivated towards bridging this gap by proposing swarm-array computing, a novel technique to achieve autonomy for distributed parallel computing systems. Among three proposed approaches, the second approach, namely 'Intelligent Agents' is of focus in this paper. The task to be executed on parallel computing cores is considered as a swarm of autonomous agents. A task is carried to a computing core by carrier. agents and can be seamlessly transferred between cores in the event of a pre-dicted failure, thereby achieving self-ware objectives of autonomic computing. The feasibility of the proposed approach is validated on a multi-agent simulator.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A parallel formulation for the simulation of a branch prediction algorithm is presented. This parallel formulation identifies independent tasks in the algorithm which can be executed concurrently. The parallel implementation is based on the multithreading model and two parallel programming platforms: pthreads and Cilk++. Improvement in execution performance by up to 7 times is observed for a generic 2-bit predictor in a 12-core multiprocessor system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The InteGrade project is a multi-university effort to build a novel grid computing middleware based on the opportunistic use of resources belonging to user workstations. The InteGrade middleware currently enables the execution of sequential, bag-of-tasks, and parallel applications that follow the BSP or the MPI programming models. This article presents the lessons learned over the last five years of the InteGrade development and describes the solutions achieved concerning the support for robust application execution. The contributions cover the related fields of application scheduling, execution management, and fault tolerance. We present our solutions, describing their implementation principles and evaluation through the analysis of several experimental results. (C) 2010 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The single factor limiting the harnessing of the enormous computing power of clusters for parallel computing is the lack of appropriate software. Present cluster operating systems are not built to support parallel computing – they do not provide services to manage parallelism. The cluster operating environments that are used to assist the execution of parallel applications do not provide support for both Message Passing (MP) or Distributed Shared Memory (DSM) paradigms. They are only offered as separate components implemented at the user level as library and independent servers. Due to poor operating systems users must deal with computers of a cluster rather than to see this cluster as a single powerful computer. A Single System Image of the cluster is not offered to users. There is a need for an operating system for clusters. We claim and demonstrate that it is possible to develop a cluster operating system that is
able to efficiently manage parallelism, support Message Passing and DSM and offer the Single System Image. In order to substantiate the claim the first version of a cluster operating system, called GENESIS, that manages parallelism and offers the Single System Image has been developed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Studies have shown that most of the computers in a non-dedicated cluster are often idle or lightly loaded. The underutilized computers in a non-dedicated cluster can be employed to execute parallel applications. The aim of this study is to learn how concurrent execution of a computation-bound and sequential applications influence their execution performance and cluster utilization. The result of the study has demonstrated that a computation-bound parallel application benefits from load balancing, and at the same time sequential applications suffer only an insignificant slowdown of execution. Overall, the utilization of a non-dedicated cluster is improved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we have demonstrated how the existing programming environments, tools and middleware could be used for the study of execution performance of parallel and sequential applications on a non-dedicated cluster. A set of parallel and sequential benchmark applications selected for and used in the experiments were characterized, and experiment requirements shown. 

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent research efforts of parallel processing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However, as a collection of independent computers used by multiple users, clusters are susceptible to failure. This paper shows the development of a coordinated checkpointing facility for the GENESIS cluster operating system. This facility was developed by exploiting existing operating system services. High performance and low overheads are achieved by allowing the processes of a parallel application to continue executing during the creation of checkpoints, while maintaining low demands on cluster resources by using coordinated checkpointing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although individual PCs of a cluster are used by their owners to run sequential applications (local jobs), the cluster as a whole or its subset can also be employed to run parallel applications (cluster jobs) even during working hours. This implies that these computers have to be shared by parallel and sequential applications, which could lead to the improvement of the execution performance and resource utilization. However, there is a lack of experimental study showing the behavior and performance of executing parallel and sequential applications concurrently on a non-dedicated cluster. The result of such research would be beneficial for the development of new global scheduling algorithms. We present the result of an experimental study into scheduling of a mixture of parallel and sequential applications on a non-dedicated cluster. The aim of this study is to learn how the concurrent execution of a communication intensive parallel application and sequential applications influences their execution performance and utilization of the cluster.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Studies have shown that most of the computers in a non-dedicated cluster are often idle or lightly loaded. The underutilized computers in a non-dedicated cluster can be employed to execute parallel applications. The aim of this study is to learn how concurrent execution of a computation-bound and sequential applications influence their execution performance and cluster utilization. The result of the study has demonstrated that a computation-bound parallel application benefits from load balancing, and at the same time sequential applications suffer only an insignificant slowdown of execution. Overall, the utilization of a non-dedicated cluster is improved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We assert that companies can make more money and research institutions can improve their performance if inexpensive clusters and enterprise grids are exploited. In this paper, we have demonstrated that our claim is valid by showing the study of how programming environments, tools and middleware could be used for the execution of parallel and sequential applications, multiple parallel applications executing simultaneously on a non-dedicated cluster, and parallel applications on an enterprise grid and that the execution performance was improved. For this purpose an execution environment, and parallel and sequential benchmark applications selected for, and used in, the experiments were characterised.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cloud computing is the most recent realisation of computing as a utility. Recently, fields with substantial computational requirements, e.g., biology, are turning to clouds for cheap, on-demand provisioning of resources. Of interest to this paper is the execution of compute intensive applications on hybrid clouds. If application requirements exceed private cloud resource capacity, clients require scaling down their applications. The outcome of this research is Web technology realising a new form of cloud called HPC Hybrid Deakin (H2D) Cloud -- an experimental hybrid cloud capable of utilising both local and remote computational services for single large embarrassingly parallel applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, fields with substantial computing requirementshave turned to cloud computing for economical, scalable, and on-demandprovisioning of required execution environments. However, current cloudofferings focus on providing individual servers while tasks such as applicationdistribution and data preparation are left to cloud users. This article presents anew form of cloud called HPC Hybrid Deakin (H2D) cloud; an experimentalhybrid cloud capable of utilising both local and remote computational servicesfor large embarrassingly parallel applications. As well as supporting execution,H2D also provides a new service, called DataVault, that provides transparentdata management services so all cloud-hosted clusters have required datasetsbefore commencing execution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The evolution of wireless communication systems leads to Dynamic Spectrum Allocation for Cognitive Radio, which requires reliable spectrum sensing techniques. Among the spectrum sensing methods proposed in the literature, those that exploit cyclostationary characteristics of radio signals are particularly suitable for communication environments with low signal-to-noise ratios, or with non-stationary noise. However, such methods have high computational complexity that directly raises the power consumption of devices which often have very stringent low-power requirements. We propose a strategy for cyclostationary spectrum sensing with reduced energy consumption. This strategy is based on the principle that p processors working at slower frequencies consume less power than a single processor for the same execution time. We devise a strict relation between the energy savings and common parallel system metrics. The results of simulations show that our strategy promises very significant savings in actual devices.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

[EN] The accuracy and performance of current variational optical ow methods have considerably increased during the last years. The complexity of these techniques is high and enough care has to be taken for the implementation. The aim of this work is to present a comprehensible implementation of recent variational optical flow methods. We start with an energy model that relies on brightness and gradient constancy terms and a ow-based smoothness term. We minimize this energy model and derive an e cient implicit numerical scheme. In the experimental results, we evaluate the accuracy and performance of this implementation with the Middlebury benchmark database. We show that it is a competitive solution with respect to current methods in the literature. In order to increase the performance, we use a simple strategy to parallelize the execution on multi-core processors.