Biblioteca Digital

984 resultados para load balancing

Implementing intelligent cores using processor virtualization for fault tolerance

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Processor virtualization for process migration in distributed parallel computing systems has formed a significant component of research on load balancing. In contrast, the potential of processor virtualization for fault tolerance has been addressed minimally. The work reported in this paper is motivated towards extending concepts of processor virtualization towards ‘intelligent cores’ as a means to achieve fault tolerance in distributed parallel computing systems. Intelligent cores are an abstraction of the hardware processing cores, with the incorporation of cognitive capabilities, on which parallel tasks can be executed and migrated. When a processing core executing a task is predicted to fail the task being executed is proactively transferred onto another core. A parallel reduction algorithm incorporating concepts of intelligent cores is implemented on a computer cluster using Adaptive MPI and Charm ++. Preliminary results confirm the feasibility of the approach.

Non-uniform data distribution for communication-efficient parallel clustering

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element

Mesh adaptation on the sphere using optimal transport and the numerical solution of a Monge-Ampère type equation

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An equation of Monge-Ampère type has, for the first time, been solved numerically on the surface of the sphere in order to generate optimally transported (OT) meshes, equidistributed with respect to a monitor function. Optimal transport generates meshes that keep the same connectivity as the original mesh, making them suitable for r-adaptive simulations, in which the equations of motion can be solved in a moving frame of reference in order to avoid mapping the solution between old and new meshes and to avoid load balancing problems on parallel computers. The semi-implicit solution of the Monge-Ampère type equation involves a new linearisation of the Hessian term, and exponential maps are used to map from old to new meshes on the sphere. The determinant of the Hessian is evaluated as the change in volume between old and new mesh cells, rather than using numerical approximations to the gradients. OT meshes are generated to compare with centroidal Voronoi tesselations on the sphere and are found to have advantages and disadvantages; OT equidistribution is more accurate, the number of iterations to convergence is independent of the mesh size, face skewness is reduced and the connectivity does not change. However anisotropy is higher and the OT meshes are non-orthogonal. It is shown that optimal transport on the sphere leads to meshes that do not tangle. However, tangling can be introduced by numerical errors in calculating the gradient of the mesh potential. Methods for alleviating this problem are explored. Finally, OT meshes are generated using observed precipitation as a monitor function, in order to demonstrate the potential power of the technique.

Grid job scheduling using Route with Genetic Algorithm support

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In 2006 the Route load balancing algorithm was proposed and compared to other techniques aiming at optimizing the process allocation in grid environments. This algorithm schedules tasks of parallel applications considering computer neighborhoods (where the distance is defined by the network latency). Route presents good results for large environments, although there are cases where neighbors do not have an enough computational capacity nor communication system capable of serving the application. In those situations the Route migrates tasks until they stabilize in a grid area with enough resources. This migration may take long time what reduces the overall performance. In order to improve such stabilization time, this paper proposes RouteGA (Route with Genetic Algorithm support) which considers historical information on parallel application behavior and also the computer capacities and load to optimize the scheduling. This information is extracted by using monitors and summarized in a knowledge base used to quantify the occupation of tasks. Afterwards, such information is used to parameterize a genetic algorithm responsible for optimizing the task allocation. Results confirm that RouteGA outperforms the load balancing carried out by the original Route, which had previously outperformed others scheduling algorithms from literature.

Scheduling of a parallel computation-bound application and sequential applications executing concurrently on a cluster - a case study

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Studies have shown that most of the computers in a non-dedicated cluster are often idle or lightly loaded. The underutilized computers in a non-dedicated cluster can be employed to execute parallel applications. The aim of this study is to learn how concurrent execution of a computation-bound and sequential applications influence their execution performance and cluster utilization. The result of the study has demonstrated that a computation-bound parallel application benefits from load balancing, and at the same time sequential applications suffer only an insignificant slowdown of execution. Overall, the utilization of a non-dedicated cluster is improved.

Performance evaluation of the concurrent execution of NAS parallel benchmarks with BYTE sequential benchmarks on a cluster

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Computers of a non-dedicated cluster are often idle (users attend meetings, have lunch or coffee breaks) or lightly loaded (users carry out simple computations). These underutilized computers can be employed to execute parallel applications not only during weekends and at nights but also during office hours. Thus, they have to be shared by parallel and sequential applications which could lead to the improvement of their execution performance. However, there is a lack of experimental study showing the behavior and performance of parallel and sequential applications executing concurrently on clusters. We present here the result of an experimental study into load balancing based scheduling of a mixture of parallel and sequential applications on a non-dedicated cluster.

The performance of a parallel TSP program and byte sequential benchmarks executing on a shared cluster

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Although individual PCs of a cluster are used by their owners to run sequential applications (local jobs), the cluster as a whole or its subset can also be employed to run parallel applications (cluster jobs) even during working hours. This implies that these computers have to be shared by parallel and sequential applications, which could lead to the improvement of the execution performance and resource utilization. However, there is a lack of experimental study showing the behavior and performance of executing parallel and sequential applications concurrently on a non-dedicated cluster. The result of such research would be beneficial for the development of new global scheduling algorithms. We present the result of an experimental study into scheduling of a mixture of parallel and sequential applications on a non-dedicated cluster. The aim of this study is to learn how the concurrent execution of a communication intensive parallel application and sequential applications influences their execution performance and utilization of the cluster.

The crossroads approach to information discovery in wireless sensor networks

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Wireless sensor networks (WSN) are attractive for information gathering in large-scale data rich environments. In order to fully exploit the data gathering and dissemination capabilities of these networks, energy-efficient and scalable solutions for data storage and information discovery are essential. In this paper, we formulate the information discovery problem as a load-balancing problem, with the combined aim being to maximize network lifetime and minimize query processing delay resulting in QoS improvements. We propose a novel information storage and distribution mechanism that takes into account the residual energy levels in individual sensors. Further, we propose a hybrid push-pull strategy that enables fast response to information discovery queries.

Simulations results prove the proposed method(s) of information discovery offer significant QoS benefits for global as well as individual queries in comparison to previous approaches.

A study of the concurrent execution of parallel and sequential applications on a non-dedicated cluster

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Computers of a non-dedicated cluster are often idle (users attend meetings, have lunch or coffee breaks) or lightly loaded (users carry out simple computations to support problem solving activities). These underutilised computers can be employed to execute parallel applications. Thus, these computers can be shared by parallel and sequential applications, which could lead to the improvement of their execution performance. However, there is a lack of experimental study showing the applications’ performance and the system utilization of executing parallel and sequential applications concurrently and concurrent execution of multiple parallel applications on a non-dedicated cluster. Here we present the result of an experimental study into load balancing based scheduling of mixtures of NAS Parallel Benchmarks and BYTE sequential applications on a very low cost non-dedicated cluster. This study showed that the proposed sharing provided performance boost as compared to the execution of the parallel load in isolation on a reduced number of computers and better cluster utilization. The results of this research were used not only to validate other researchers’ result generated by simulation but also to support our research mission of widening the use of non-dedicated clusters. Our promising results obtained could promote further research studies to convince universities, business and industry, which require a large amount of computing resources, to run parallel applications on their already owned non-dedicated clusters.

Exploiting affinity propagation for energy-efficient information discovery in sensor networks

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Wireless sensor networks (WSN) are attractive for information gathering in large-scale data rich environments. Emerging WSN applications require dissemination of information to interested clients within the network requiring support for differing traffic patterns. Further, in-network query processing capabilities are required for autonomic information discovery. In this paper, we formulate the information discovery problem as a load-balancing problem, with the combined aim being to maximize network lifetime and minimize query processing delay. We propose novel methods for data dissemination, information discovery and data aggregation that are designed to provide significant QoS benefits. We make use of affinity propagation to group "similar" sensors and have developed efficient mechanisms that can resolve both ALL-type and ANY-type queries in-network with improved energy-efficiency and query resolution time. Simulation results prove the proposed method(s) of information discovery offer significant QoS benefits for ALL-type and ANY-type queries in comparison to previous approaches.

Multiple strategy process migration

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The future of computing lies with distributed systems, i.e. a network of workstations controlled by a modern distributed operating system. By supporting load balancing and parallel execution, the overall performance of a distributed system can be improved dramatically. Process migration, the act of moving a running process from a highly loaded machine to a lightly loaded machine, could be used to support load balancing, parallel execution, reliability etc. This thesis identifies the problems past process migration facilities have had and determines the possible differing strategies that can be used to resolve these problems. The result of this analysis has led to a new design philosophy. This philosophy requires the design of a process migration facility and the design of an operating system to be conducted in parallel. Modern distributed operating systems follow the microkernel and client/server paradigms. Applying these design paradigms, in conjunction with the requirements of both process migration and a distributed operating system, results in a system where each resource is controlled by a separate server process. However, a process is a complex resource composed of simple resources such as data structures, an address space and communication state. For this reason, a process migration facility does not directly migrate the resources of a process. Instead, it requests the appropriate servers to transfer the resources. This novel solution yields a modular, high performance facility that is easy to create, debug and maintain. Furthermore, the design easily incorporates providing multiple migration strategies. In order to verify the validity of this design, a process migration facility was developed and tested within RHODOS (ResearcH Oriented Distributed Operating System). RHODOS is a modern microkernel and client/server based distributed operating system. In RHODOS, a process is composed of at least three separate resources: process state - maintained by a process manager, address space - maintained by a memory manager and communication state - maintained by an InterProcess Communication Manager (IPCM). The RHODOS multiple strategy migration manager utilises the services of the process, memory and IPC Managers to migrate the resources of a process. Performance testing of this facility indicates that this design is as fast or better than existing systems which use faster hardware. Furthermore, by studying the results of the performance test ing, the conditions under which a particular strategy should be employed have been identified. This thesis also addresses heterogeneous process migration. The current trend is to have islands of homogeneous workstations amid a sea of heterogeneity. From this situation and the current literature on the topic, heterogeneous process migration can be seen as too inefficient for general use. Instead, only homogeneous workstations should be used for process migration. This implies a need to locate homogeneous workstations. Entities called traders, which store and disseminate knowledge about the resources of several workstations, should be used to provide resource discovery. Resource discovery will enable the detection of homogeneous workstations to which processes can be migrated.

Management of SPMD based parallel processing on clusters of workstations

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Current attempts to manage parallel applications on Clusters of Workstations (COWs) have either generally followed the parallel execution environment approach or been extensions to existing network operating systems, both of which do not provide complete or satisfactory solutions. The efficient and transparent management of parallelism within the COW environment requires enhanced methods of process instantiation, mapping of parallel process to workstations, maintenance of process relationships, process communication facilities, and process coordination mechanisms. The aim of this research is to synthesise, design, develop and experimentally study a system capable of efficiently and transparently managing SPMD parallelism on a COW. This system should both improve the performance of SPMD based parallel programs and relieve the programmer from the involvement into parallelism management in order to allow them to concentrate on application programming. It is also the aim of this research to show that such a system, to achieve these objectives, is best achieved by adding new special services and exploiting the existing services of a client/server and microkernel based distributed operating system. To achieve these goals the research methods of the experimental computer science should be employed. In order to specify the scope of this project, this work investigated the issues related to parallel processing on COWs and surveyed a number of relevant systems including PVM, NOW and MOSIX. It was shown that although the MOSIX system provide a number of good services related to parallelism management, none of the system forms a complete solution. The problems identified with these systems include: instantiation services that are not suited to parallel processing; duplication of services between the parallelism management environment and the operating system; and poor levels of transparency. A high performance and transparent system capable of managing the execution of SPMD parallel applications was synthesised and the specific services of process instantiation, process mapping and process interaction detailed. The process instantiation service designed here provides the capability to instantiate parallel processes using either creation or duplication methods and also supports multiple and group based instantiation which is specifically design for SPMD parallel processing. The process mapping service provides the combination of process allocation and dynamic load balancing to ensure the load of a COW remains balanced not only at the time a parallel program is initialised but also during the execution of the program. The process interaction service guarantees to maintain transparently process relationships, communications and coordination services between parallel processes regardless of their location within the COW. The combination of these services provides an original architecture and organisation of a system that is capable of fully managing the execution of SPMD parallel applications on a COW. A logical design of a parallelism management system was developed derived from the synthesised system and was shown that it should ideally be based on a distributed operating system employing the client server model. The client/server based distributed operating system provides the level of transparency, modularity and flexibility necessary for a complete parallelism management system. The services identified in the synthesised system have been mapped to a set of server processes including: Process Instantiation Server providing advanced multiple and group based process creation and duplication; Process Mapping Server combining load collection, process allocation and dynamic load balancing services; and Process Interaction Server providing transparent interprocess communication and coordination. A Process Migration Server was also identified as vital to support both the instantiation and mapping servers. The RHODOS client/server and microkernel based distributed operating system was selected to carry out research into the detailed design and to be used for the implementation this parallelism management system. RHODOS was enhanced to provide the required servers and resulted in the development of the REX Manager, Global Scheduler and Process Migration Manager to provide the services of process instantiation, mapping and migration, respectively. The process interaction services were already provided within RHODOS and only required some extensions to the existing Process Manager and IPC Managers. Through a variety of experiments it was shown that when this system was used to support the execution of SPMD parallel applications the overall execution times were improved, especially when multiple and group based instantiation services are employed. The RHODOS PMS was also shown to greatly reduce the programming burden experienced by users when writing SPMD parallel applications by providing a small set of powerful primitives specially designed to support parallel processing. The system was also shown to be applicable and has been used in a variety of other research areas such as Distributed Shared Memory, Parallelising Compilers and assisting the port of PVM to the RHODOS system. The RHODOS Parallelism Management System (PMS) provides a unique and creative solution to the problem of transparently and efficiently controlling the execution of SPMD parallel applications on COWs. Combining advanced services such as multiple and group based process creation and duplication; combined process allocation and dynamic load balancing; and complete COW wide transparency produces a totally new system that addresses many of the problems not addressed in other systems.

Scheduling of a parallel computation-bound application and sequential applications executing concurrently on a cluster - a case study.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Studies have shown that most of the computers in a non-dedicated cluster are often idle or lightly loaded. The underutilized computers in a non-dedicated cluster can be employed to execute parallel applications. The aim of this study is to learn how concurrent execution of a computation-bound and sequential applications influence their execution performance and cluster utilization. The result of the study has demonstrated that a computation-bound parallel application benefits from load balancing, and at the same time sequential applications suffer only an insignificant slowdown of execution. Overall, the utilization of a non-dedicated cluster is improved.

Parallel computing on clusters and enterprise grids : practice and experience

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We assert that companies can make more money and research institutions can improve their performance if inexpensive clusters and enterprise grids are exploited. In this paper, we have demonstrated that our claim is valid by showing the study of how programming environments, tools and middleware could be used for the execution of parallel and sequential applications, multiple parallel applications executing simultaneously on a non-dedicated cluster, and parallel applications on an enterprise grid and that the execution performance was improved. For this purpose an execution environment, and parallel and sequential benchmark applications selected for, and used in, the experiments were characterised.

Joint resource allocation for max-min throughput in multicell networks

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We investigate the resource-allocation problem in multicell networks targeting the max-min throughput of all cells. A joint optimization over power control, channel allocation, and user association is considered, and the problem is then formulated as a nonconvex mixed-integer nonlinear problem (MINLP). To solve this problem, we proposed an alternating-optimization-based algorithm, which applies branch-and-bound and simulated annealing in solving subproblems at each optimization step. We also demonstrate the convergence and efficiency of the proposed algorithms by thorough numerical experiments. The experimental results show that joint optimization over all resources outperforms the restricted optimization over individual resources significantly.

«
1
2
...
5
6
7
8
9
10
11
...
65
66
»