825 resultados para FAULT TOLERANCE
Resumo:
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.
Resumo:
In questo lavoro di tesi si affronta una delle problematiche che si presentano oggi nell'impiego degli APR (Aeromobili a Pilotaggio Remoto): la gestione della safety. Non si può più, in altri termini, negare che tali oggetti siano parte integrante dello spazio aereo civile. Proprio su questo tema recentemente gli enti regolatori dello spazio aereo stanno proiettando i loro sforzi al fine di stabilire una serie di regolamenti che disciplinino da una parte le modalità con cui questi oggetti si interfacciano con le altre categorie di velivoli e dall'altra i criteri di idoneità perché anche essi possano operare nello spazio aereo in maniera sicura. Si rende quindi necessario, in tal senso, dotare essi stessi di un sufficiente grado di sicurezza che permetta di evitare eventi disastrosi nel momento in cui si presenta un guasto nel sistema; è questa la definizione di un sistema fail-safe. Lo studio e lo sviluppo di questa tipologia di sistemi può aiutare il costruttore a superare la barriera oggi rappresentata dal regolamento che spesso e volentieri rappresenta l'unico ostacolo non fisico per la categoria dei velivoli unmanned tra la terra e il cielo. D'altro canto, al fine di garantire a chi opera a distanza su questi oggetti di avere, per tutta la durata della missione, la chiara percezione dello stato di funzionamento attuale del sistema e di come esso può (o potrebbe) interagire con l'ambiente che lo circonda (situational awarness), è necessario dotare il velivolo di apparecchiature che permettano di poter rilevare, all'occorrenza, il malfunzionamento: è questo il caso dei sistemi di fault detection. Questi due fondamentali aspetti sono la base fondante del presente lavoro che verte sul design di un ridotto ma preponderante sottosistema dell'UAV: il sistema di attuazione delle superfici di controllo. Esse sono, infatti, l'unico mezzo disponibile all'operatore per governare il mezzo nelle normali condizioni di funzionamento ma anche l'ultima possibilità per tentare di evitare l'evento disastroso nel caso altri sottosistemi siano chiaramente fuori dalle condizioni di normale funzionamento dell'oggetto.
Resumo:
The design of fault tolerant systems is gaining importance in large domains of embedded applications where design constrains are as important as reliability. New software techniques, based on selective application of redundancy, have shown remarkable fault coverage with reduced costs and overheads. However, the large number of different solutions provided by these techniques, and the costly process to assess their reliability, make the design space exploration a very difficult and time-consuming task. This paper proposes the integration of a multi-objective optimization tool with a software hardening environment to perform an automatic design space exploration in the search for the best trade-offs between reliability, cost, and performance. The first tool is commanded by a genetic algorithm which can simultaneously fulfill many design goals thanks to the use of the NSGA-II multi-objective algorithm. The second is a compiler-based infrastructure that automatically produces selective protected (hardened) versions of the software and generates accurate overhead reports and fault coverage estimations. The advantages of our proposal are illustrated by means of a complex and detailed case study involving a typical embedded application, the AES (Advanced Encryption Standard).
Resumo:
Switched reluctance motor (SRM) drives are one competitive technology for traction motor drives. This paper proposes a novel and flexible SRM fault-tolerant topology with fault diagnosis, fault tolerance, and advanced control functions. The converter is composed of a single-phase bridge and a relay network, based on the traditional asymmetrical half-bridge driving topology. When the SRM-driving system is subjected to fault conditions including open-circuit and short-circuit faults, the proposed converter starts its fault-diagnosis procedure to locate the fault. Based on the relay network, the faulty part can be bypassed by the single-phase bridge arm, while the single-phase bridge arm and the healthy part of the converter can form a fault-tolerant topology to sustain the driving operation. A fault-tolerant control strategy is developed to decrease the influence of the fault. Furthermore, the proposed fault-tolerant strategy can be applied to three-phase 12/8 SRM and four-phase 8/6 SRM. Simulation results in MATLAB/Simulink and experiments on a three-phase 12/8 SRM and a four-phase 8/6 SRM validate the effectiveness of the proposed strategy, which may have significant economic implications in traction drive systems.
Resumo:
While the robots gradually become a part of our daily lives, they already play vital roles in many critical operations. Some of these critical tasks include surgeries, battlefield operations, and tasks that take place in hazardous environments or distant locations such as space missions. ^ In most of these tasks, remotely controlled robots are used instead of autonomous robots. This special area of robotics is called teleoperation. Teleoperation systems must be reliable when used in critical tasks; hence, all of the subsystems must be dependable even under a subsystem or communication line failure. ^ These systems are categorized as unilateral or bilateral teleoperation. A special type of bilateral teleoperation is described as force-reflecting teleoperation, which is further investigated as limited- and unlimited-workspace teleoperation. ^ Teleoperation systems configured in this study are tested both in numerical simulations and experiments. A new method, Virtual Rapid Robot Prototyping, is introduced to create system models rapidly and accurately. This method is then extended to configure experimental setups with actual master systems working with system models of the slave robots accompanied with virtual reality screens as well as the actual slaves. Fault-tolerant design and modeling of the master and slave systems are also addressed at different levels to prevent subsystem failure. ^ Teleoperation controllers are designed to compensate for instabilities due to communication time delays. Modifications to the existing controllers are proposed to configure a controller that is reliable in communication line failures. Position/force controllers are also introduced for master and/or slave robots. Later, controller architecture changes are discussed in order to make these controllers dependable even in systems experiencing communication problems. ^ The customary and proposed controllers for teleoperation systems are tested in numerical simulations on single- and multi-DOF teleoperation systems. Experimental studies are then conducted on seven different systems that included limited- and unlimited-workspace teleoperation to verify and improve simulation studies. ^ Experiments of the proposed controllers were successful relative to the customary controllers. Overall, by employing the fault-tolerance features and the proposed controllers, a more reliable teleoperation system is possible to design and configure which allows these systems to be used in a wider range of critical missions. ^
Resumo:
While the robots gradually become a part of our daily lives, they already play vital roles in many critical operations. Some of these critical tasks include surgeries, battlefield operations, and tasks that take place in hazardous environments or distant locations such as space missions. In most of these tasks, remotely controlled robots are used instead of autonomous robots. This special area of robotics is called teleoperation. Teleoperation systems must be reliable when used in critical tasks; hence, all of the subsystems must be dependable even under a subsystem or communication line failure. These systems are categorized as unilateral or bilateral teleoperation. A special type of bilateral teleoperation is described as force-reflecting teleoperation, which is further investigated as limited- and unlimited-workspace teleoperation. Teleoperation systems configured in this study are tested both in numerical simulations and experiments. A new method, Virtual Rapid Robot Prototyping, is introduced to create system models rapidly and accurately. This method is then extended to configure experimental setups with actual master systems working with system models of the slave robots accompanied with virtual reality screens as well as the actual slaves. Fault-tolerant design and modeling of the master and slave systems are also addressed at different levels to prevent subsystem failure. Teleoperation controllers are designed to compensate for instabilities due to communication time delays. Modifications to the existing controllers are proposed to configure a controller that is reliable in communication line failures. Position/force controllers are also introduced for master and/or slave robots. Later, controller architecture changes are discussed in order to make these controllers dependable even in systems experiencing communication problems. The customary and proposed controllers for teleoperation systems are tested in numerical simulations on single- and multi-DOF teleoperation systems. Experimental studies are then conducted on seven different systems that included limited- and unlimited-workspace teleoperation to verify and improve simulation studies. Experiments of the proposed controllers were successful relative to the customary controllers. Overall, by employing the fault-tolerance features and the proposed controllers, a more reliable teleoperation system is possible to design and configure which allows these systems to be used in a wider range of critical missions.
Resumo:
This letter presents an FPGA implementation of a fault-tolerant Hopfield NeuralNetwork (HNN). The robustness of this circuit against Single Event Upsets (SEUs) and Single Event Transients (SETs) has been evaluated. Results show the fault tolerance of the proposed design, compared to a previous non fault- tolerant implementation and a solution based on triple modular redundancy (TMR) of a standard HNN design.
Resumo:
Ordinary desktop computers continue to obtain ever more resources – in-creased processing power, memory, network speed and bandwidth – yet these resources spend much of their time underutilised. Cycle stealing frameworks harness these resources so they can be used for high-performance computing. Traditionally cycle stealing systems have used client-server based architectures which place significant limits on their ability to scale and the range of applica-tions they can support. By applying a fully decentralised network model to cycle stealing the limits of centralised models can be overcome. Using decentralised networks in this manner presents some difficulties which have not been encountered in their previous uses. Generally decentralised ap-plications do not require any significant fault tolerance guarantees. High-performance computing on the other hand requires very stringent guarantees to ensure correct results are obtained. Unfortunately mechanisms developed for traditional high-performance computing cannot be simply translated because of their reliance on a reliable storage mechanism. In the highly dynamic world of P2P computing this reliable storage is not available. As part of this research a fault tolerance system has been created which provides considerable reliability without the need for a persistent storage. As well as increased scalability, fully decentralised networks offer the ability for volunteers to communicate directly. This ability provides the possibility of supporting applications whose tasks require direct, message passing style communication. Previous cycle stealing systems have only supported embarrassingly parallel applications and applications with limited forms of communication so a new programming model has been developed which can support this style of communication within a cycle stealing context. In this thesis I present a fully decentralised cycle stealing framework. The framework addresses the problems of providing a reliable fault tolerance sys-tem and supporting direct communication between parallel tasks. The thesis includes a programming model for developing cycle stealing applications with direct inter-process communication and methods for optimising object locality on decentralised networks.
Resumo:
A distributed fuzzy system is a real-time fuzzy system in which the input, output and computation may be located on different networked computing nodes. The ability for a distributed software application, such as a distributed fuzzy system, to adapt to changes in the computing network at runtime can provide real-time performance improvement and fault-tolerance. This paper introduces an Adaptable Mobile Component Framework (AMCF) that provides a distributed dataflow-based platform with a fine-grained level of runtime reconfigurability. The execution location of small fragments (possibly as little as few machine-code instructions) of an AMCF application can be moved between different computing nodes at runtime. A case study is included that demonstrates the applicability of the AMCF to a distributed fuzzy system scenario involving multiple physical agents (such as autonomous robots). Using the AMCF, fuzzy systems can now be developed such that they can be distributed automatically across multiple computing nodes and are adaptable to runtime changes in the networked computing environment. This provides the opportunity to improve the performance of fuzzy systems deployed in scenarios where the computing environment is resource-constrained and volatile, such as multiple autonomous robots, smart environments and sensor networks.
Resumo:
Современный этап развития комплексов автоматического управления и навигации малогабаритными БЛА многократного применения предъявляет высокие требования к автономности, точности и миниатюрности данных систем. Противоречивость требований диктует использование функционального и алгоритмического объединения нескольких разнотипных источников навигационной информации в едином вычислительном процессе на основе методов оптимальной фильтрации. Получили широкое развитие бесплатформенные инерциальные навигационные системы (БИНС) на основе комплексирования данных микромеханических датчиков инерциальной информации и датчиков параметров движения в воздушном потоке с данными спутниковых навигационных систем (СНС). Однако в современных условиях такой подход не в полной мере реализует требования к помехозащищённости, автономности и точности получаемой навигационной информации. Одновременно с этим достигли значительного прогресса навигационные системы, использующие принципы корреляционно экстремальной навигации по оптическим ориентирам и цифровым картам местности. Предлагается схема построения автономной автоматической навигационной системы (АНС) для БЛА многоразового применения на основе объединения алгоритмов БИНС, спутниковой навигационной системы и оптической навигационной системы. The modern stage of automatic control and guidance systems development for small unmanned aerial vehicles (UAV) is determined by advanced requirements for autonomy, accuracy and size of the systems. The contradictory of the requirements dictates novel functional and algorithmic tight coupling of several different onboard sensors into one computational process, which is based on methods of optimal filtering. Nowadays, data fusion of micro-electro mechanical sensors of inertial measurement units, barometric pressure sensors, and signals of global navigation satellite systems (GNSS) receivers is widely used in numerous strap down inertial navigation systems (INS). However, the systems do not fully comply with such requirements as jamming immunity, fault tolerance, autonomy, and accuracy of navigation. At the same time, the significant progress has been recently demonstrated by the navigation systems, which use the correlation extremal principle applied for optical data flow and digital maps. This article proposes a new architecture of automatic navigation management system (ANMS) for small UAV, which combines algorithms of strap down INS, satellite navigation and optical navigation system.
Resumo:
This paper presents on overview of the issues in precisely defining, specifying and evaluating the dependability of software, particularly in the context of computer controlled process systems. Dependability is intended to be a generic term embodying various quality factors and is useful for both software and hardware. While the developments in quality assurance and reliability theories have proceeded mostly in independent directions for hardware and software systems, we present here the case for developing a unified framework of dependability—a facet of operational effectiveness of modern technological systems, and develop a hierarchical systems model helpful in clarifying this view. In the second half of the paper, we survey the models and methods available for measuring and improving software reliability. The nature of software “bugs”, the failure history of the software system in the various phases of its lifecycle, the reliability growth in the development phase, estimation of the number of errors remaining in the operational phase, and the complexity of the debugging process have all been considered to varying degrees of detail. We also discuss the notion of software fault-tolerance, methods of achieving the same, and the status of other measures of software dependability such as maintainability, availability and safety.
Resumo:
This paper is aimed at reviewing the notion of Byzantine-resilient distributed computing systems, the relevant protocols and their possible applications as reported in the literature. The three agreement problems, namely, the consensus problem, the interactive consistency problem, and the generals problem have been discussed. Various agreement protocols for the Byzantine generals problem have been summarized in terms of their performance and level of fault-tolerance. The three classes of Byzantine agreement protocols discussed are the deterministic, randomized, and approximate agreement protocols. Finally, application of the Byzantine agreement protocols to clock synchronization is highlighted.
Resumo:
There exists various suggestions for building a functional and a fault-tolerant large-scale quantum computer. Topological quantum computation is a more exotic suggestion, which makes use of the properties of quasiparticles manifest only in certain two-dimensional systems. These so called anyons exhibit topological degrees of freedom, which, in principle, can be used to execute quantum computation with intrinsic fault-tolerance. This feature is the main incentive to study topological quantum computation. The objective of this thesis is to provide an accessible introduction to the theory. In this thesis one has considered the theory of anyons arising in two-dimensional quantum mechanical systems, which are described by gauge theories based on so called quantum double symmetries. The quasiparticles are shown to exhibit interactions and carry quantum numbers, which are both of topological nature. Particularly, it is found that the addition of the quantum numbers is not unique, but that the fusion of the quasiparticles is described by a non-trivial fusion algebra. It is discussed how this property can be used to encode quantum information in a manner which is intrinsically protected from decoherence and how one could, in principle, perform quantum computation by braiding the quasiparticles. As an example of the presented general discussion, the particle spectrum and the fusion algebra of an anyon model based on the gauge group S_3 are explicitly derived. The fusion algebra is found to branch into multiple proper subalgebras and the simplest one of them is chosen as a model for an illustrative demonstration. The different steps of a topological quantum computation are outlined and the computational power of the model is assessed. It turns out that the chosen model is not universal for quantum computation. However, because the objective was a demonstration of the theory with explicit calculations, none of the other more complicated fusion subalgebras were considered. Studying their applicability for quantum computation could be a topic of further research.
Resumo:
Our main result is a new sequential method for the design of decentralized control systems. Controller synthesis is conducted on a loop-by-loop basis, and at each step the designer obtains an explicit characterization of the class C of all compensators for the loop being closed that results in closed-loop system poles being in a specified closed region D of the s-plane, instead of merely stabilizing the closed-loop system. Since one of the primary goals of control system design is to satisfy basic performance requirements that are often directly related to closed-loop pole location (bandwidth, percentage overshoot, rise time, settling time), this approach immediately allows the designer to focus on other concerns such as robustness and sensitivity. By considering only compensators from class C and seeking the optimum member of that set with respect to sensitivity or robustness, the designer has a clearly-defined limited optimization problem to solve without concern for loss of performance. A solution to the decentralized tracking problem is also provided. This design approach has the attractive features of expandability, the use of only 'local models' for controller synthesis, and fault tolerance with respect to certain types of failure.
Resumo:
Mobile ad-hoc networks (MANETs) have recently drawn significant research attention since they offer unique benefits and versatility with respect to bandwidth spatial reuse, intrinsic fault tolerance, and low-cost rapid deployment. This paper addresses the issue of delay sensitive realtime data transport in these type of networks. An effective QoS mechanism is thereby required for the speedy transport of the realtime data. QoS issue in MANET is an open-end problem. Various QoS measures are incorporated in the upperlayers of the network, but a few techniques addresses QoS techniques in the MAC layer. There are quite a few QoS techniques in the MAC layer for the infrastructure based wireless network. The goal and the challenge is to achieve a QoS delivery and a priority access to the real time traffic in adhoc wireless environment, while maintaining democracy in the resource allocation. We propose a MAC layer protocol called "FCP based FAMA protocol", which allocates the channel resources to the needy in a more democratic way, by examining the requirements, malicious behavior and genuineness of the request. We have simulated both the FAMA as well as FCP based FAMA and tested in various MANET conditions. Simulated results have clearly shown a performance improvement in the channel utilization and a decrease in the delay parameters in the later case. Our new protocol outperforms the other QoS aware MAC layer protocols.