962 resultados para FAULT TOLERANCE


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively detect computing core failures and take action to relocate the computing core's job onto reliable cores can make a significant step towards automating fault tolerance. Method: This paper describes an experimental investigation into the use of multi-agent approaches for fault tolerance. Two approaches are studied, the first at the job level and the second at the core level. The approaches are investigated for single core failure scenarios that can occur in the execution of parallel reduction algorithms on computer clusters. A third approach is proposed that incorporates multi-agent technology both at the job and core level. Experiments are pursued in the context of genome searching, a popular computational biology application. Result: The key conclusion is that the approaches proposed are feasible for automating fault tolerance in high-performance computing systems with minimal human intervention. In a typical experiment in which the fault tolerance is studied, centralised and decentralised checkpointing approaches on an average add 90% to the actual time for executing the job. On the other hand, in the same experiment the multi-agent approaches add only 10% to the overall execution time

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an architecture (Multi-μ) being implemented to study and develop software based fault tolerant mechanisms for Real-Time Systems, using the Ada language (Ada 95) and Commercial Off-The-Shelf (COTS) components. Several issues regarding fault tolerance are presented and mechanisms to achieve fault tolerance by software active replication in Ada 95 are discussed. The Multi-μ architecture, based on a specifically proposed Fault Tolerance Manager (FTManager), is then described. Finally, some considerations are made about the work being done and essential future developments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An n-dimensional Mobius cube, 0MQ(n) or 1MQ(n), is a variation of n-dimensional cube Q(n) which possesses many attractive properties such as significantly smaller communication delay and stronger graph-embedding capabilities. In some practical situations, the fault tolerance of a distributed memory multiprocessor system can be measured more precisely by the connectivity of the underlying graph under forbidden fault set models. This article addresses the connectivity of 0MQ(n)/1MQ(n), under two typical forbidden fault set models. We first prove that the connectivity of 0MQ(n)/1MQ(n) is 2n - 2 when the fault set does not contain the neighborhood of any vertex as a subset. We then prove that the connectivity of 0MQ(n)/1MQ(n) is 3n - 5 provided that the neighborhood of any vertex as well as that of any edge cannot fail simultaneously These results demonstrate that 0MQ(n)/1MQ(n) has the same connectivity as Q(n) under either of the previous assumptions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Processor virtualization for process migration in distributed parallel computing systems has formed a significant component of research on load balancing. In contrast, the potential of processor virtualization for fault tolerance has been addressed minimally. The work reported in this paper is motivated towards extending concepts of processor virtualization towards ‘intelligent cores’ as a means to achieve fault tolerance in distributed parallel computing systems. Intelligent cores are an abstraction of the hardware processing cores, with the incorporation of cognitive capabilities, on which parallel tasks can be executed and migrated. When a processing core executing a task is predicted to fail the task being executed is proactively transferred onto another core. A parallel reduction algorithm incorporating concepts of intelligent cores is implemented on a computer cluster using Adaptive MPI and Charm ++. Preliminary results confirm the feasibility of the approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Service-based architectures enable the development of new classes of Grid and distributed applications. One of the main capabilities provided by such systems is the dynamic and flexible integration of services, according to which services are allowed to be a part of more than one distributed system and simultaneously serve different applications. This increased flexibility in system composition makes it difficult to address classical distributed system issues such as fault-tolerance. While it is relatively easy to make an individual service fault-tolerant, improving fault-tolerance of services collaborating in multiple application scenarios is a challenging task. In this paper, we look at the issue of developing fault-tolerant service-based distributed systems, and propose an infrastructure to implement fault tolerance capabilities transparent to services.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A distributed database system is subject to site failure and link failure. This paper presents a reactive system approach to achieving fault tolerance in such a system. The reactive system concepts are an attractive paradigm for system design, development and maintenance because it separates policies from mechanisms. In the paper we give a solution using different reactive modules to implement the fault tolerant policies and the failure detection mechanisms. The solution shows that they can be separated without impact on each other; thus the system can adapt to constant changes in environments and user requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A distributed database system is subject to site failure and link failure. This paper presents a reactive system approach to achieving the fault-tolerance in such a system. The reactive system concepts are an attractive paradigm for system design, development and maintenance because it separates policies from mechanisms. In the paper we give a solution using different reactive modules to implement the fault-tolerant policies and the failure detection mechanisms. The solution shows that they can be separated without impact on each other thus the system can adapt to constant changes in user requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The provision of fault tolerance is an important aspect to the success of distributed and cluster computing. Through this research , a transparent, autonomic and efficient fault tolerant facility was designed and implemented; thereby relieving the burden of a user having to handle and react to the failure of an application.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fault tolerant manipulators maintain their trajectory even if their joint/s fails. Assuming that the manipulator is fault tolerant on its trajectory, fault tolerant compliance manipulators provide required force at their end-effector even when a joint fails. To achieve this, the contributions of the faulty joints for the force of the end-effector are required to be mapped into the proper compensating joint torques of the healthy joints to maintain the force. This paper addresses the optimal mapping to minimize the force jump due to a fault, which is the maximum effort to maintain the force when a fault occurs. The paper studies the locked joint fault/s of the redundant manipulators and it relates the force jump at the end-effector to the faults within the joints. Adding on a previous study to maintain the trajectory, in here the objective is to providing fault tolerant force at the end-effector of the redundant manipulators. This optimal mapping with minimum force jump is presented using matrix perturbation model. And the force jump is calculated through this model for single and multiple joints fault. The proposed optimal mapping is used in different fault scenarios for a 5-DOF manipulator; also it is deployed to compensate the force at the end-effector for the 5-DOF manipulator through simulation study and the results are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fault tolerance for a class of non linear systems is addressed based on the velocity of their output variables. This paper presents a mapping to minimize the possible jump of the velocity of the output, due to the actuator failure. The failure of the actuator is assumed as actuator lock. The mapping is derived and it provides the proper input commands for the healthy actuators of the system to tolerate the effect of the faulty actuator on the output of the system. The introduced mapping works as an optimal input reconfiguration for fault recovery, which provides a minimum velocity jump suitable for static nonlinear systems. The proposed mapping is validated through different case studies and a complementary simulation. In the case studies and the simulation, the mapping provides the commands to compensate the effect of different faults within the joints of a robotic manipulator. The new commands and the compare between the velocity of the output variables for the health and faulty system are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Autonomous or teleoperation of critical tasks in space applications require fault tolerant robotic manipulators. These manipulators are able to maintain their tasks even if a joint fails. If it is presumed that the manipulator is fault tolerant on its trajectory, then the next step is to provide a fault tolerant force at the end-effector of the manipulator. The problem of cooperative fault tolerant force is addressed in this paper within the operation of two manipulators. The cooperative manipulators are used to compensate the force jump which occurs on the force of the end-effector of one manipulator due to a joint failure. To achieve fault tolerant operation, the contribution of the faulty joint for the force of the end-effector of the faulty manipulator is required to be optimally mapped into the torque of the faulty and healthy manipulators. The optimal joint torque reconfigurations of both manipulators for compensating this force jump are illustrated. The proposed frameworks are deployed for two cooperative PUMA560 manipulators. The results of the case studies validate the fault tolerant cooperation strategies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies the difference between the human behaviours for fault tolerance with a pseudo inverse reconfiguration approach for fault tolerance of robotic arms. If this difference is well understood then it can be used to introduce a hybrid approach for fault tolerant motion of robotic arms. The proposed approach is expected to combine human fault-tolerance dexterity and advantages of a model based fault tolerance. The main aim is to add human dexterity for fault tolerance of robotic arms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Static nonlinear systems are common when the model of the kinematics of mechanical or civil structures is analyzed for instance kinematics of robotic manipulators. This paper addresses the maximum effort toward fault tolerance for any number of the locked actuators failures in static nonlinear systems. It optimally reconfigures the inputs via a mapping that maximally accommodates the failures. The mapping maps the failures to an extra action of healthy actuators that results to a minimum jump for the velocity of the output variables. Then from this mapping, the minimum jump of the velocity of the output is calculated. The conditions for a zero velocity jump of the output variables are discussed. This shows that, when the conditions of fault tolerance are maintained, the proposed framework is capable of fault recovery not only at fault instances but also at the whole output trajectory. The proposed mapping is validated by three case studies.