31 resultados para fault tolerant systems

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work, a fault-tolerant control scheme is applied to a air handling unit of a heating, ventilation and air-conditioning system. Using the multiple-model approach it is possible to identify faults and to control the system under faulty and normal conditions in an effective way. Using well known techniques to model and control the process, this work focuses on the importance of the cost function in the fault detection and its influence on the reconfigurable controller. Experimental results show how the control of the terminal unit is affected in the presence a fault, and how the recuperation and reconfiguration of the control action is able to deal with the effects of faults.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts. However, the research does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely ‘Intelligent Agents’. In the approach considered a task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The agents hence contribute towards fault tolerance and towards building reliable systems. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper presents how workflow-oriented, single-user Grid portals could be extended to meet the requirements of users with collaborative needs. Through collaborative Grid portals different research and engineering teams would be able to share knowledge and resources. At the same time the workflow concept assures that the shared knowledge and computational capacity is aggregated to achieve the high-level goals of the group. The paper discusses the different issues collaborative support requires from Grid portal environments during the different phases of the workflow-oriented development work. While in the design period the most important task of the portal is to provide consistent and fault tolerant data management, during the workflow execution it must act upon the security framework its back-end Grids are built on.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An interconnection network with n nodes is four-pancyclic if it contains a cycle of length l for each integer l with 4 <= l <= n. An interconnection network is fault-tolerant four-pancyclic if the surviving network is four-pancyclic in the presence of faults. The fault-tolerant four-pancyclicity of interconnection networks is a desired property because many classical parallel algorithms can be mapped onto such networks in a communication-efficient fashion, even in the presence of failing nodes or edges. Due to some attractive properties as compared with its hypercube counterpart of the same size, the Mobius cube has been proposed as a promising candidate for interconnection topology. Hsieh and Chen [S.Y. Hsieh, C.H. Chen, Pancyclicity on Mobius cubes with maximal edge faults, Parallel Computing, 30(3) (2004) 407-421.] showed that an n-dimensional Mobius cube is four-pancyclic in the presence of up to n-2 faulty edges. In this paper, we show that an n-dimensional Mobius cube is four-pancyclic in the presence of up to n-2 faulty nodes. The obtained result is optimal in that, if n-1 nodes are removed, the surviving network may not be four-pancyclic. (C) 2005 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In order to make a full evaluation of an interconnection network, it is essential to estimate the minimum size of a largest connected component of this network provided the faulty vertices in the network may break its connectedness. Star graphs are recognized as promising candidates for interconnection networks. This article addresses the size of a largest connected component of a faulty star graph. We prove that, in an n-star graph (n >= 3) with up to 2n-4 faulty vertices, all fault-free vertices but at most two form a connected component. Moreover, all fault-free vertices but exactly two form a connected component if and only if the set of all faulty vertices is equal to the neighbourhood of a pair of fault-free adjacent vertices. These results show that star graphs exhibit excellent fault-tolerant abilities in the sense that there exists a large functional network in a faulty star graph.