844 resultados para Software-based fault tolerance


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an architecture (Multi-μ) being implemented to study and develop software based fault tolerant mechanisms for Real-Time Systems, using the Ada language (Ada 95) and Commercial Off-The-Shelf (COTS) components. Several issues regarding fault tolerance are presented and mechanisms to achieve fault tolerance by software active replication in Ada 95 are discussed. The Multi-μ architecture, based on a specifically proposed Fault Tolerance Manager (FTManager), is then described. Finally, some considerations are made about the work being done and essential future developments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Commercial off-the-shelf microprocessors are the core of low-cost embedded systems due to their programmability and cost-effectiveness. Recent advances in electronic technologies have allowed remarkable improvements in their performance. However, they have also made microprocessors more susceptible to transient faults induced by radiation. These non-destructive events (soft errors), may cause a microprocessor to produce a wrong computation result or lose control of a system with catastrophic consequences. Therefore, soft error mitigation has become a compulsory requirement for an increasing number of applications, which operate from the space to the ground level. In this context, this paper uses the concept of selective hardening, which is aimed to design reduced-overhead and flexible mitigation techniques. Following this concept, a novel flexible version of the software-based fault recovery technique known as SWIFT-R is proposed. Our approach makes possible to select different registers subsets from the microprocessor register file to be protected on software. Thus, design space is enriched with a wide spectrum of new partially protected versions, which offer more flexibility to designers. This permits to find the best trade-offs between performance, code size, and fault coverage. Three case studies have been developed to show the applicability and flexibility of the proposal.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a new architecture targeting real-time and reliable Distributed Computer-Controlled Systems (DCCS). This architecture provides a structured approach for the integration of soft and/or hard real-time applications with Commercial O -The-Shelf (COTS) components. The Timely Computing Base model is used as the reference model to deal with the heterogeneity of system components with respect to guaranteeing the timeliness of applications. The reliability and availability requirements of hard real-time applications are guaranteed by a software-based fault-tolerance approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Building reliable real-time applications on top of commercial off-the-shelf (COTS) components is not a straightforward task. Thus, it is essential to provide a simple and transparent programming model, in order to abstract programmers from the low-level implementation details of distribution and replication. However, the recent trend for incorporating pre-emptive multitasking applications in reliable real-time systems inherently increases its complexity. It is therefore important to provide a transparent programming model, enabling pre-emptive multitasking applications to be implemented without resorting to simultaneously dealing with both system requirements and distribution and replication issues. The distributed embedded architecture using COTS components (DEAR-COTS) architecture has been previously proposed as an architecture to support real-time and reliable distributed computer-controlled systems (DCCS) using COTS components. Within the DEAR-COTS architecture, the hard real-time subsystem provides a framework for the development of reliable real-time applications, which are the core of DCCS applications. This paper presents the proposed framework, and demonstrates how it can be used to support the transparent replication of software components.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fault tolerance allows a system to remain operational to some degree when some of its components fail. One of the most common fault tolerance mechanisms consists on logging the system state periodically, and recovering the system to a consistent state in the event of a failure. This paper describes a general fault tolerance logging-based mechanism, which can be layered over deterministic systems. Our proposal describes how a logging mechanism can recover the underlying system to a consistent state, even if an action or set of actions were interrupted mid-way, due to a server crash. We also propose different methods of storing the logging information, and describe how to deploy a fault tolerant master-slave cluster for information replication. We adapt our model to a previously proposed framework, which provided common relational features, like transactions with atomic, consistent, isolated and durable properties, to NoSQL database management systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is imperative to accept that failures can and will occur, even in meticulously designed distributed systems, and design proper measures to counter those failures. Passive replication minimises resource consumption by only activating redundant replicas in case of failures, as typically providing and applying state updates is less resource demanding than requesting execution. However, most existing solutions for passive fault tolerance are usually designed and configured at design time, explicitly and statically identifying the most critical components and their number of replicas, lacking the needed flexibility to handle the runtime dynamics of distributed component-based embedded systems. This paper proposes a cost-effective adaptive fault tolerance solution with a significant lower overhead compared to a strict active redundancy-based approach, achieving a high error coverage with the minimum amount of redundancy. The activation of passive replicas is coordinated through a feedback-based coordination model that reduces the complexity of the needed interactions among components until a new collective global service solution is determined, improving the overall maintainability and robustness of the system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Service-based architectures enable the development of new classes of Grid and distributed applications. One of the main capabilities provided by such systems is the dynamic and flexible integration of services, according to which services are allowed to be a part of more than one distributed system and simultaneously serve different applications. This increased flexibility in system composition makes it difficult to address classical distributed system issues such as fault-tolerance. While it is relatively easy to make an individual service fault-tolerant, improving fault-tolerance of services collaborating in multiple application scenarios is a challenging task. In this paper, we look at the issue of developing fault-tolerant service-based distributed systems, and propose an infrastructure to implement fault tolerance capabilities transparent to services.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Technology scaling has proceeded into dimensions in which the reliability of manufactured devices is becoming endangered. The reliability decrease is a consequence of physical limitations, relative increase of variations, and decreasing noise margins, among others. A promising solution for bringing the reliability of circuits back to a desired level is the use of design methods which introduce tolerance against possible faults in an integrated circuit. This thesis studies and presents fault tolerance methods for network-onchip (NoC) which is a design paradigm targeted for very large systems-onchip. In a NoC resources, such as processors and memories, are connected to a communication network; comparable to the Internet. Fault tolerance in such a system can be achieved at many abstraction levels. The thesis studies the origin of faults in modern technologies and explains the classification to transient, intermittent and permanent faults. A survey of fault tolerance methods is presented to demonstrate the diversity of available methods. Networks-on-chip are approached by exploring their main design choices: the selection of a topology, routing protocol, and flow control method. Fault tolerance methods for NoCs are studied at different layers of the OSI reference model. The data link layer provides a reliable communication link over a physical channel. Error control coding is an efficient fault tolerance method especially against transient faults at this abstraction level. Error control coding methods suitable for on-chip communication are studied and their implementations presented. Error control coding loses its effectiveness in the presence of intermittent and permanent faults. Therefore, other solutions against them are presented. The introduction of spare wires and split transmissions are shown to provide good tolerance against intermittent and permanent errors and their combination to error control coding is illustrated. At the network layer positioned above the data link layer, fault tolerance can be achieved with the design of fault tolerant network topologies and routing algorithms. Both of these approaches are presented in the thesis together with realizations in the both categories. The thesis concludes that an optimal fault tolerance solution contains carefully co-designed elements from different abstraction levels

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Network diagnosis in Wireless Sensor Networks (WSNs) is a difficult task due to their improvisational nature, invisibility of internal running status, and particularly since the network structure can frequently change due to link failure. To solve this problem, we propose a Mobile Sink (MS) based distributed fault diagnosis algorithm for WSNs. An MS, or mobile fault detector is usually a mobile robot or vehicle equipped with a wireless transceiver that performs the task of a mobile base station while also diagnosing the hardware and software status of deployed network sensors. Our MS mobile fault detector moves through the network area polling each static sensor node to diagnose the hardware and software status of nearby sensor nodes using only single hop communication. Therefore, the fault detection accuracy and functionality of the network is significantly increased. In order to maintain an excellent Quality of Service (QoS), we employ an optimal fault diagnosis tour planning algorithm. In addition to saving energy and time, the tour planning algorithm excludes faulty sensor nodes from the next diagnosis tour. We demonstrate the effectiveness of the proposed algorithms through simulation and real life experimental results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an analysis of the fault tolerance achieved by an autonomous, fully embedded evolvable hardware system, which uses a combination of partial dynamic reconfiguration and an evolutionary algorithm (EA). It demonstrates that the system may self-recover from both transient and cumulative permanent faults. This self-adaptive system, based on a 2D array of 16 (4×4) Processing Elements (PEs), is tested with an image filtering application. Results show that it may properly recover from faults in up to 3 PEs, that is, more than 18% cumulative permanent faults. Two fault models are used for testing purposes, at PE and CLB levels. Two self-healing strategies are also introduced, depending on whether fault diagnosis is available or not. They are based on scrubbing, fitness evaluation, dynamic partial reconfiguration and in-system evolutionary adaptation. Since most of these adaptability features are already available on the system for its normal operation, resource cost for self-healing is very low (only some code additions in the internal microprocessor core)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electric vehicles (EVs) and hybrid electric vehicles (HEVs) can reduce greenhouse gas emissions while switched reluctance motor (SRM) is one of the promising motor for such applications. This paper presents a novel SRM fault-diagnosis and fault-tolerance operation solution. Based on the traditional asymmetric half-bridge topology for the SRM driving, the central tapped winding of the SRM in modular half-bridge configuration are introduced to provide fault-diagnosis and fault-tolerance functions, which are set idle in normal conditions. The fault diagnosis can be achieved by detecting the characteristic of the excitation and demagnetization currents. An SRM fault-tolerance operation strategy is also realized by the proposed topology, which compensates for the missing phase torque under the open-circuit fault, and reduces the unbalanced phase current under the short-circuit fault due to the uncontrolled faulty phase. Furthermore, the current sensor placement strategy is also discussed to give two placement methods for low cost or modular structure. Simulation results in MATLAB/Simulink and experiments on a 750-W SRM validate the effectiveness of the proposed strategy, which may have significant implications and improve the reliability of EVs/HEVs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, further improvements regarding the fault location problem for power distribution systems are presented. The proposed improvements relate to the capacitive effect consideration on impedance-based fault location methods, by considering an exact line segment model for the distribution line. The proposed developments, which consist of a new formulation for the fault location problem and a new algorithm that considers the line shunt admittance matrix, are presented. The proposed equations are developed for any fault type and result in one single equation for all ground fault types, and another equation for line-to-line faults. Results obtained with the proposed improvements are presented. Also, in order to compare the improvements performance and demonstrate how the line shunt admittance affects the state-of-the-art impedance-based fault location methodologies for distribution systems, the results obtained with two other existing methods are presented. Comparative results show that, in overhead distribution systems with laterals and intermediate loads, the line shunt admittance can significantly affect the state-of-the-art methodologies response, whereas in this case the proposed developments present great improvements by considering this effect.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Report for the scientific sojourn at the University of Linköping between April to July 2007. Monitoring of the air intake system of an automotive engine is important to meet emission related legislative diagnosis requirements. During the research the problem of fault detection in the air intake system was stated as a constraint satisfaction problem over continuous domains with a big number of variables and constraints. This problem was solved using Interval-based Consistency Techniques. Interval-based consistency techniques are shown to be particularly efficient for checking the consistency of the Analytical Redundancy Relations (ARRs), dealing with uncertain measurements and parameters, and using experimental data. All experiments were performed on a four-cylinder turbo-charged spark-ignited SAAB engine located in the research laboratory at Vehicular System Group - University of Linköping.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The advent of multiparametric MRI has made it possible to change the way in which prostate biopsy is done, allowing to direct biopsies to suspicious lesions rather than randomly. The subject of this review relates to a computer-assisted strategy, the MRI/US fusion software-based targeted biopsy, and to its performance compared to the other sampling methods. Different devices with different methods to register MR images to live TRUS are currently in use to allow software-based targeted biopsy. Main clinical indications of MRI/US fusion software-based targeted biopsy are re-biopsy in men with persistent suspicious of prostate cancer after first negative standard biopsy and the follow-up of patients under active surveillance. Some studies have compared MRI/US fusion software-based targeted versus standard biopsy. In men at risk with MRI-suspicious lesion, targeted biopsy consistently detects more men with clinically significant disease as compared to standard biopsy; some studies have also shown decreased detection of insignificant disease. Only two studies directly compared MRI/US fusion software-based targeted biopsy with MRI/US fusion visual targeted biopsy, and the diagnostic ability seems to be in favor of the software approach. To date, no study comparing software-based targeted biopsy against in-bore MRI biopsy is available. The new software-based targeted approach seems to have the characteristics to be added in the standard pathway for achieving accurate risk stratification. Once reproducibility and cost-effectiveness will be verified, the actual issue will be to determine whether MRI/TRUS fusion software-based targeted biopsy represents anadd-on test or a replacement to standard TRUS biopsy.