947 resultados para fault tolerant systems


Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the size of digital systems increases, the mean time between single component failures diminishes. To avoid component related failures, large computers must be fault-tolerant. In this paper, we focus on methods for achieving a high degree of fault-tolerance in multistage routing networks. We describe a multipath scheme for providing end-to-end fault-tolerance on large networks. The scheme improves routing performance while keeping network latency low. We also describe the novel routing component, RN1, which implements this scheme, showing how it can be the basic building block for fault-tolerant multistage routing networks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts. However, the research does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely ‘Intelligent Agents’. In the approach considered a task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The agents hence contribute towards fault tolerance and towards building reliable systems. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most fault-tolerant application programs cannot cope with constant changes in their environments and user requirements because they embed policies and mechanisms together so that if the policies or mechanisms are  changed the whole programs have to be changed as well. This paper presents a reactive system approach to overcoming this limitation. The reactive system concepts are an attractive paradigm for system design, development and maintenance because they separate policies from mechanisms. In the paper we propose a generic reactive system architecture and use group communication primitives to model it. We then implement it as a generic package which can be applied in any distributed applications. The system performance shows that it can be used in a distributed environment effectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fault tolerance for a class of non linear systems is addressed based on the velocity of their output variables. This paper presents a mapping to minimize the possible jump of the velocity of the output, due to the actuator failure. The failure of the actuator is assumed as actuator lock. The mapping is derived and it provides the proper input commands for the healthy actuators of the system to tolerate the effect of the faulty actuator on the output of the system. The introduced mapping works as an optimal input reconfiguration for fault recovery, which provides a minimum velocity jump suitable for static nonlinear systems. The proposed mapping is validated through different case studies and a complementary simulation. In the case studies and the simulation, the mapping provides the commands to compensate the effect of different faults within the joints of a robotic manipulator. The new commands and the compare between the velocity of the output variables for the health and faulty system are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis addresses “Optimal Fault-Tolerant Robotic Manipulators” for locked-joint failures and consists of three components. It begins by investigating the regions of workspace where the manipulator can operate with high reliability. It then continues with an efficient deployment of kinematic redundancies for fault-tolerant operation. Finally, it presents a novel method for design of optimal fault-tolerant manipulators.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the actuator failure compensation problem of non-linear fourwheel-steering mobile robots based on vehicle kinematics, undergoing both known and unknown failures causing degenerated steering performance or wheels stuck at some observable angles. Terminal sliding mode control technique is used to guarantee the tracking stability infinite time with the presence of actuator fault. Simulation results are given to illustrate the effectiveness of the proposed control scheme. © Institution of Engineers Australia 2012.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As clouds have been deployed widely in various fields, the reliability and availability of clouds become the major concern of cloud service providers and users. Thereby, fault tolerance in clouds receives a great deal of attention in both industry and academia, especially for real-time applications due to their safety critical nature. Large amounts of researches have been conducted to realize fault tolerance in distributed systems, among which fault-tolerant scheduling plays a significant role. However, few researches on the fault-tolerant scheduling study the virtualization and the elasticity, two key features of clouds, sufficiently. To address this issue, this paper presents a fault-tolerant mechanism which extends the primary-backup model to incorporate the features of clouds. Meanwhile, for the first time, we propose an elastic resource provisioning mechanism in the fault-tolerant context to improve the resource utilization. On the basis of the fault-tolerant mechanism and the elastic resource provisioning mechanism, we design novel fault-tolerant elastic scheduling algorithms for real-time tasks in clouds named FESTAL, aiming at achieving both fault tolerance and high resource utilization in clouds. Extensive experiments injecting with random synthetic workloads as well as the workload from the latest version of the Google cloud tracelogs are conducted by CloudSim to compare FESTAL with three baseline algorithms, i.e., Non-M igration-FESTAL (NMFESTAL), Non-Overlapping-FESTAL (NOFESTAL), and Elastic First Fit (EFF). The experimental results demonstrate that FESTAL is able to effectively enhance the performance of virtualized clouds.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern control systems are becoming more and more complex and control algorithms more and more sophisticated. Consequently, Fault Detection and Diagnosis (FDD) and Fault Tolerant Control (FTC) have gained central importance over the past decades, due to the increasing requirements of availability, cost efficiency, reliability and operating safety. This thesis deals with the FDD and FTC problems in a spacecraft Attitude Determination and Control System (ADCS). Firstly, the detailed nonlinear models of the spacecraft attitude dynamics and kinematics are described, along with the dynamic models of the actuators and main external disturbance sources. The considered ADCS is composed of an array of four redundant reaction wheels. A set of sensors provides satellite angular velocity, attitude and flywheel spin rate information. Then, general overviews of the Fault Detection and Isolation (FDI), Fault Estimation (FE) and Fault Tolerant Control (FTC) problems are presented, and the design and implementation of a novel diagnosis system is described. The system consists of a FDI module composed of properly organized model-based residual filters, exploiting the available input and output information for the detection and localization of an occurred fault. A proper fault mapping procedure and the nonlinear geometric approach are exploited to design residual filters explicitly decoupled from the external aerodynamic disturbance and sensitive to specific sets of faults. The subsequent use of suitable adaptive FE algorithms, based on the exploitation of radial basis function neural networks, allows to obtain accurate fault estimations. Finally, this estimation is actively exploited in a FTC scheme to achieve a suitable fault accommodation and guarantee the desired control performances. A standard sliding mode controller is implemented for attitude stabilization and control. Several simulation results are given to highlight the performances of the overall designed system in case of different types of faults affecting the ADCS actuators and sensors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"This project is funded in part by NASA grant NSG 1471."

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"August 1980."

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper provides a discussion on future direct current (DC) network development in terms of system protection under DC-side fault scenarios. The argument between appropriate DC circuit breaker and new DC fault-tolerant converters is discussed after a review on DC technology development and bottleneck issues that require proper solutions. The overcurrent/cost curve of power-electronic DC circuit breakers (CB) superimposed to voltage-source converter (VSC) systems is derived and compared with other possible fault-tolerant power conversion options. This in-advance planning of protection capability is essential for the future development of DC networks.