875 resultados para level of fault-tolerance


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multiprocessor systems which afford a high degree of parallelism are used in a variety of applications. The extremely stringent reliability requirement has made the provision of fault-tolerance an important aspect in the design of such systems. This paper presents a review of the various approaches towards tolerating hardware faults in multiprocessor systems. It. emphasizes the basic concepts of fault tolerant design and the various problems to be taken care of by the designer. An indepth survey of the various models, techniques and methods for fault diagnosis is given. Further, we consider the strategies for fault-tolerance in specialized multiprocessor architectures which have the ability of dynamic reconfiguration and are suited to VLSI implementation. An analysis of the state-óf-the-art is given which points out the major aspects of fault-tolerance in such architectures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The provision of fault tolerance is an important aspect to the success of distributed and cluster computing. Through this research , a transparent, autonomic and efficient fault tolerant facility was designed and implemented; thereby relieving the burden of a user having to handle and react to the failure of an application.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Static nonlinear systems are common when the model of the kinematics of mechanical or civil structures is analyzed for instance kinematics of robotic manipulators. This paper addresses the maximum effort toward fault tolerance for any number of the locked actuators failures in static nonlinear systems. It optimally reconfigures the inputs via a mapping that maximally accommodates the failures. The mapping maps the failures to an extra action of healthy actuators that results to a minimum jump for the velocity of the output variables. Then from this mapping, the minimum jump of the velocity of the output is calculated. The conditions for a zero velocity jump of the output variables are discussed. This shows that, when the conditions of fault tolerance are maintained, the proposed framework is capable of fault recovery not only at fault instances but also at the whole output trajectory. The proposed mapping is validated by three case studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is aimed at reviewing the notion of Byzantine-resilient distributed computing systems, the relevant protocols and their possible applications as reported in the literature. The three agreement problems, namely, the consensus problem, the interactive consistency problem, and the generals problem have been discussed. Various agreement protocols for the Byzantine generals problem have been summarized in terms of their performance and level of fault-tolerance. The three classes of Byzantine agreement protocols discussed are the deterministic, randomized, and approximate agreement protocols. Finally, application of the Byzantine agreement protocols to clock synchronization is highlighted.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Relentless CMOS scaling coupled with lower design tolerances is making ICs increasingly susceptible to wear-out related permanent faults and transient faults, necessitating on-chip fault tolerance in future chip microprocessors (CMPs). In this paper we introduce a new energy-efficient fault-tolerant CMP architecture known as Redundant Execution using Critical Value Forwarding (RECVF). RECVF is based on two observations: (i) forwarding critical instruction results from the leading to the trailing core enables the latter to execute faster, and (ii) this speedup can be exploited to reduce energy consumption by operating the trailing core at a lower voltage-frequency level. Our evaluation shows that RECVF consumes 37% less energy than conventional dual modular redundant (DMR) execution of a program. It consumes only 1.26 times the energy of a non-fault-tolerant baseline and has a performance overhead of just 1.2%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new hybrid multilevel power converter topology is presented in this paper. The proposed power converter topology uses only one DC source and floating capacitors charged to asymmetrical voltage levels, are used for generating different voltage levels. The SVPWM based control strategy used in this converter maintains the capacitor voltages at the required levels in the entire modulation range including the over-modulation region. For the voltage levels: nine and above, the number of components required in the proposed topology is significantly lower, compared to the conventional multilevel inverter topologies. The number of capacitors required in this topology reduces drastically compared to the conventional flying capacitor topology, when the number of levels in the inverter output increases. This topology has better fault tolerance, as it is capable of operating with reduced number of levels, in the entire modulation range, in the event of any failure in the H-bridges. The transient as well as the steady state performance of the nine-level version of the proposed topology is experimentally verified in the entire modulation range including the over-modulation region.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the complexity of computing systems grows, reliability and energy are two crucial challenges asking for holistic solutions. In this paper, we investigate the interplay among concurrency, power dissipation, energy consumption and voltage-frequency scaling for a key numerical kernel for the solution of sparse linear systems. Concretely, we leverage a task-parallel implementation of the Conjugate Gradient method, equipped with an state-of-the-art pre-conditioner embedded in the ILUPACK software, and target a low-power multi core processor from ARM.In addition, we perform a theoretical analysis on the impact of a technique like Near Threshold Voltage Computing (NTVC) from the points of view of increased hardware concurrency and error rate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively detect computing core failures and take action to relocate the computing core's job onto reliable cores can make a significant step towards automating fault tolerance. Method: This paper describes an experimental investigation into the use of multi-agent approaches for fault tolerance. Two approaches are studied, the first at the job level and the second at the core level. The approaches are investigated for single core failure scenarios that can occur in the execution of parallel reduction algorithms on computer clusters. A third approach is proposed that incorporates multi-agent technology both at the job and core level. Experiments are pursued in the context of genome searching, a popular computational biology application. Result: The key conclusion is that the approaches proposed are feasible for automating fault tolerance in high-performance computing systems with minimal human intervention. In a typical experiment in which the fault tolerance is studied, centralised and decentralised checkpointing approaches on an average add 90% to the actual time for executing the job. On the other hand, in the same experiment the multi-agent approaches add only 10% to the overall execution time

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an architecture (Multi-μ) being implemented to study and develop software based fault tolerant mechanisms for Real-Time Systems, using the Ada language (Ada 95) and Commercial Off-The-Shelf (COTS) components. Several issues regarding fault tolerance are presented and mechanisms to achieve fault tolerance by software active replication in Ada 95 are discussed. The Multi-μ architecture, based on a specifically proposed Fault Tolerance Manager (FTManager), is then described. Finally, some considerations are made about the work being done and essential future developments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An n-dimensional Mobius cube, 0MQ(n) or 1MQ(n), is a variation of n-dimensional cube Q(n) which possesses many attractive properties such as significantly smaller communication delay and stronger graph-embedding capabilities. In some practical situations, the fault tolerance of a distributed memory multiprocessor system can be measured more precisely by the connectivity of the underlying graph under forbidden fault set models. This article addresses the connectivity of 0MQ(n)/1MQ(n), under two typical forbidden fault set models. We first prove that the connectivity of 0MQ(n)/1MQ(n) is 2n - 2 when the fault set does not contain the neighborhood of any vertex as a subset. We then prove that the connectivity of 0MQ(n)/1MQ(n) is 3n - 5 provided that the neighborhood of any vertex as well as that of any edge cannot fail simultaneously These results demonstrate that 0MQ(n)/1MQ(n) has the same connectivity as Q(n) under either of the previous assumptions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Autonomous or teleoperation of critical tasks in space applications require fault tolerant robotic manipulators. These manipulators are able to maintain their tasks even if a joint fails. If it is presumed that the manipulator is fault tolerant on its trajectory, then the next step is to provide a fault tolerant force at the end-effector of the manipulator. The problem of cooperative fault tolerant force is addressed in this paper within the operation of two manipulators. The cooperative manipulators are used to compensate the force jump which occurs on the force of the end-effector of one manipulator due to a joint failure. To achieve fault tolerant operation, the contribution of the faulty joint for the force of the end-effector of the faulty manipulator is required to be optimally mapped into the torque of the faulty and healthy manipulators. The optimal joint torque reconfigurations of both manipulators for compensating this force jump are illustrated. The proposed frameworks are deployed for two cooperative PUMA560 manipulators. The results of the case studies validate the fault tolerant cooperation strategies.