978 resultados para FAILURE DETECTION
Resumo:
A link failure in the path of a virtual circuit in a packet data network will lead to premature disconnection of the circuit by the end-points. A soft failure will result in degraded throughput over the virtual circuit. If these failures can be detected quickly and reliably, then appropriate rerouteing strategies can automatically reroute the virtual circuits that use the failed facility. In this paper, we develop a methodology for analysing and designing failure detection schemes for digital facilities. Based on errored second data, we develop a Markov model for the error and failure behaviour of a T1 trunk. The performance of a detection scheme is characterized by its false alarm probability and the detection delay. Using the Markov model, we analyse the performance of detection schemes that use physical layer or link layer information. The schemes basically rely upon detecting the occurrence of severely errored seconds (SESs). A failure is declared when a counter, that is driven by the occurrence of SESs, reaches a certain threshold.For hard failures, the design problem reduces to a proper choice;of the threshold at which failure is declared, and on the connection reattempt parameters of the virtual circuit end-point session recovery procedures. For soft failures, the performance of a detection scheme depends, in addition, on how long and how frequent the error bursts are in a given failure mode. We also propose and analyse a novel Level 2 detection scheme that relies only upon anomalies observable at Level 2, i.e. CRC failures and idle-fill flag errors. Our results suggest that Level 2 schemes that perform as well as Level 1 schemes are possible.
Resumo:
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.
Resumo:
Máster Universitario en Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)
Resumo:
Il video streaming in peer-to-peer sta diventando sempre più popolare e utiliz- zato. Per tali applicazioni i criteri di misurazione delle performance sono: - startup delay: il tempo che intercorre tra la connessione e l’inizio della ripro- duzione dello stream (chiamato anche switching delay), - playback delay: il tempo che intercorre tra l’invio da parte della sorgente e la riproduzione dello stream da parte di un peer, - time lag: la differenza tra i playback delay di due diversi peer. Tuttavia, al giorno d’oggi i sistemi P2P per il video streaming sono interessati da considerevoli ritardi, sia nella fase di startup che in quella di riproduzione. Un recente studio su un famoso sistema P2P per lo streaming, ha mostrato che solitamente i ritardi variano tra i 10 e i 60 secondi. Gli autori hanno osservato anche che in alcuni casi i ritardi superano i 4 minuti! Si tratta quindi di gravi inconvenienti se si vuole assistere a eventi in diretta o se si vuole fruire di applicazioni interattive. Alcuni studi hanno mostrato che questi ritardi sono la conseguenza della natura non strutturata di molti sistemi P2P. Ogni stream viene suddiviso in blocchi che vengono scambiati tra i peer. A causa della diffusione non strutturata del contenuto, i peer devono continuamente scambiare informazioni con i loro vicini prima di poter inoltrare i blocchi ricevuti. Queste soluzioni sono estremamente re- sistenti ai cambiamenti della rete, ma comportano una perdita notevole in termini di prestazioni, rendendo complicato raggiungere l’obiettivo di un broadcast in realtime. In questo progetto abbiamo lavorato su un sistema P2P strutturato per il video streaming che ha mostrato di poter offrire ottimi risultati con ritardi molto vicini a quelli ottimali. In un sistema P2P strutturato ogni peer conosce esattamente quale blocchi inviare e a quali peer. Siccome il numero di peer che compongono il sistema potrebbe essere elevato, ogni peer dovrebbe operare possedendo solo una conoscenza limitata dello stato del sistema. Inoltre il sistema è in grado di gestire arrivi e partenze, anche raggruppati, richiedendo una riorganizzazione limitata della struttura. Infine, in questo progetto abbiamo progettato e implementato una soluzione personalizzata per rilevare e sostituire i peer non più in grado di cooperare. Anche per questo aspetto, l’obiettivo è stato quello di minimizzare il numero di informazioni scambiate tra peer.
Resumo:
We investigate the problem of distributed sensors' failure detection in networks with a small number of defective sensors, whose measurements differ significantly from the neighbor measurements. We build on the sparse nature of the binary sensor failure signals to propose a novel distributed detection algorithm based on gossip mechanisms and on Group Testing (GT), where the latter has been used so far in centralized detection problems. The new distributed GT algorithm estimates the set of scattered defective sensors with a low complexity distance decoder from a small number of linearly independent binary messages exchanged by the sensors. We first consider networks with one defective sensor and determine the minimal number of linearly independent messages needed for its detection with high probability. We then extend our study to the multiple defective sensors detection by modifying appropriately the message exchange protocol and the decoding procedure. We show that, for small and medium sized networks, the number of messages required for successful detection is actually smaller than the minimal number computed theoretically. Finally, simulations demonstrate that the proposed method outperforms methods based on random walks in terms of both detection performance and convergence rate.
Resumo:
The popularity of MapReduce programming model has increased interest in the research community for its improvement. Among the other directions, the point of fault tolerance, concretely the failure detection issue seems to be a crucial one, but that until now has not reached its satisfying level. Motivated by this, I decided to devote my main research during this period into having a prototype system architecture of MapReduce framework with a new failure detection service, containing both analytical (theoretical) and implementation part. I am confident that this work should lead the way for further contributions in detecting failures to any NoSQL App frameworks, and cloud storage systems in general.
Resumo:
Background The etiology of most premature ovarian failure (POF) cases is usually elusive. Although genetic causes clearly exist and a likely susceptible region of 8q22.3 has been discovered, no predominant explanation exists for POF. More recently, evidences have indicated that mutations in NR5A1 gene could be causative for POF. We therefore screened for mutations in the NR5A1 gene in a large cohort of Chinese women with non-syndromic POF. Methods Mutation screening of NR5A1 gene was performed in 400 Han Chinese women with well-defined 46,XX idiopathic non-syndromic POF and 400 controls. Subsequently, functional characterization of the novel mutation identified was evaluated in vitro. Results A novel heterozygous missense mutation [c.13T>G (p.Tyr5Asp)] in NR5A1 was identified in 1 of 384 patients (0.26%). This mutation impaired transcriptional activation on Amh, Inhibin-a, Cyp11a1and Cyp19a1 gene, as shown by transactivation assays. However, no dominant negative effect was observed, nor was there impact on protein expression and nuclear localization. Conclusions This novel mutation p.Tyr5Asp, in a novel non-domain region, is presumed to result in haploinsufficiency. Irrespectively, perturbation in NR5A1 is not a common explanation for POF in Chinese.
Resumo:
Insulated gate bipolar transistor (IGBT) modules are important safety critical components in electrical power systems. Bond wire lift-off, a plastic deformation between wire bond and adjacent layers of a device caused by repeated power/thermal cycles, is the most common failure mechanism in IGBT modules. For the early detection and characterization of such failures, it is important to constantly detect or monitor the health state of IGBT modules, and the state of bond wires in particular. This paper introduces eddy current pulsed thermography (ECPT), a nondestructive evaluation technique, for the state detection and characterization of bond wire lift-off in IGBT modules. After the introduction of the experimental ECPT system, numerical simulation work is reported. The presented simulations are based on the 3-D electromagnetic-thermal coupling finite-element method and analyze transient temperature distribution within the bond wires. This paper illustrates the thermal patterns of bond wires using inductive heating with different wire statuses (lifted-off or well bonded) under two excitation conditions: nonuniform and uniform magnetic field excitations. Experimental results show that uniform excitation of healthy bonding wires, using a Helmholtz coil, provides the same eddy currents on each, while different eddy currents are seen on faulty wires. Both experimental and numerical results show that ECPT can be used for the detection and characterization of bond wires in power semiconductors through the analysis of the transient heating patterns of the wires. The main impact of this paper is that it is the first time electromagnetic induction thermography, so-called ECPT, has been employed on power/electronic devices. Because of its capability of contactless inspection of multiple wires in a single pass, and as such it opens a wide field of investigation in power/electronic devices for failure detection, performance characterization, and health monitoring.
Resumo:
Failure detection is at the core of most fault tolerance strategies, but it often depends on reliable communication. We present new algorithms for failure detectors which are appropriate as components of a fault tolerance system that can be deployed in situations of adverse network conditions (such as loosely connected and administered computing grids). It packs redundancy into heartbeat messages, thereby improving on the robustness of the traditional protocols. Results from experimental tests conducted in a simulated environment with adverse network conditions show significant improvement over existing solutions.
Resumo:
Insulated gate bipolar transistor (IGBT) modules are important safety critical components in electrical power systems. Bond wire lift-off, a plastic deformation between wire bond and adjacent layers of a device caused by repeated power/thermal cycles, is the most common failure mechanism in IGBT modules. For the early detection and characterization of such failures, it is important to constantly detect or monitor the health state of IGBT modules, and the state of bond wires in particular. This paper introduces eddy current pulsed thermography (ECPT), a nondestructive evaluation technique, for the state detection and characterization of bond wire lift-off in IGBT modules. After the introduction of the experimental ECPT system, numerical simulation work is reported. The presented simulations are based on the 3-D electromagnetic-thermal coupling finite-element method and analyze transient temperature distribution within the bond wires. This paper illustrates the thermal patterns of bond wires using inductive heating with different wire statuses (lifted-off or well bonded) under two excitation conditions: nonuniform and uniform magnetic field excitations. Experimental results show that uniform excitation of healthy bonding wires, using a Helmholtz coil, provides the same eddy currents on each, while different eddy currents are seen on faulty wires. Both experimental and numerical results show that ECPT can be used for the detection and characterization of bond wires in power semiconductors through the analysis of the transient heating patterns of the wires. The main impact of this paper is that it is the first time electromagnetic induction thermography, so-called ECPT, has been employed on power/electronic devices. Because of its capability of contactless inspection of multiple wires in a single pass, and as such it opens a wide field of investigation in power/electronic devices for failure detection, performance characterization, and health monitoring.
Resumo:
In most materials, short stress waves are generated during the process of plastic deformation, phase transformation, crack formation and crack growth. These phenomena are applied in acoustic emission (AE) for the detection of material defects in a wide spectrum of areas, ranging from nondestructive testing for the detection of materials defects to monitoring of microseismical activity. AE technique is also used for defect source identification and for failure detection. AE waves consist of P waves (primary longitudinal waves), S waves (shear/transverse waves) and Rayleigh (surface) waves as well as reflected and diffracted waves. The propagation of AE waves in various modes has made the determination of source location difficult. In order to use acoustic emission technique for accurate identification of source, an understanding of wave propagation of the AE signals at various locations in a plate structure is essential. Furthermore, an understanding of wave propagation can also assist in sensor location for optimum detection of AE signals along with the characteristics of the source. In real life, as the AE signals radiate from the source it will result in stress waves. Unless the type of stress wave is known, it is very difficult to locate the source when using the classical propagation velocity equations. This paper describes the simulation of AE waves to identify the source location and its characteristics in steel plate as well as the wave modes. The finite element analysis (FEA) is used for the numerical simulation of wave propagation in thin plate. By knowing the type of wave generated, it is possible to apply the appropriate wave equations to determine the location of the source. For a single plate structure, the results show that the simulation algorithm is effective to simulate different stress waves.
Resumo:
In most materials, short stress waves are generated during the process of plastic deformation, phase transformation, crack formation and crack growth. These phenomena are applied in acoustic emission (AE) for the detection of material defects in wide spectrum areas, ranging from non-destructive testing for the detection of materials defects to monitoring of microeismical activity. AE technique is also used for defect source identification and for failure detection. AE waves consist of P waves (primary/longitudinal waves), S waves (shear/transverse waves) and Rayleight (surface) waves as well as reflected and diffracted waves. The propagation of AE waves in various modes has made the determination of source location difficult. In order to use the acoustic emission technique for accurate identification of source location, an understanding of wave propagation of the AE signals at various locations in a plate structure is essential. Furthermore, an understanding of wave propagation can also assist in sensor location for optimum detection of AE signals. In real life, as the AE signals radiate from the source it will result in stress waves. Unless the type of stress wave is known, it is very difficult to locate the source when using the classical propagation velocity equations. This paper describes the simulation of AE waves to identify the source location in steel plate as well as the wave modes. The finite element analysis (FEA) is used for the numerical simulation of wave propagation in thin plate. By knowing the type of wave generated, it is possible to apply the appropriate wave equations to determine the location of the source. For a single plate structure, the results show that the simulation algorithm is effective to simulate different stress waves.
Resumo:
Fundamental investigations in ultrasonics in India date back to the early 20th century. But, fundamental and applied research in the field of nondestructive evaluation (NDE) came much later. In the last four decades it has grown steadily in academic institutions, national laboratories and industry. Currently, commensurate with rapid industrial growth and realisation of the benefits of NDE, the activity is becoming much stronger, deeper, broader and very wide spread. Acoustic Emission (AE) is a recent entry into the field of nondestructive evaluation. Pioneering efforts in India in AE were carried out at the Indian Institute of Science in the early 1970s. The nuclear industry was the first to utilise it. Current activity in AE in the country spans materials research, incipient failure detection, integrity evaluation of structures, fracture mechanics studies and rock mechanics. In this paper, we attempt to project the current scenario in ultrasonics and acoustic emission research in India.
Resumo:
Reliable messaging is a key component necessary for mobile agent systems. Current researches focus on reliable one-to-one message delivery to mobile agents. But how to implement a group communication system for mobile agents remains an open issue, which is a powerful block that facilitates the development of fault-tolerant mobile agent systems. In this paper, we propose a group communication system for mobile agents (GCS-MA), which includes totally ordered multicast and membership management functions. We divide a group of mobile agents into several agent clusters,and each agent cluster consists of all mobile agents residing in the same sub-network and is managed by a special module, named coordinator. Then, all coordinators form a ring-based overlay for interchanging messages between clusters. We present a token-based algorithm, an intra-cluster messaging algorithm and an inter-cluster migration algorithm to achieve atomicity and total ordering properties of multicast messages, by building a membership protocol on top of the clustering and failure detection mechanisms. Performance issues of the proposed system have been analysed through simulations. We also describe the application of the proposed system in the context of the service cooperation middleware (SCM) project.
Resumo:
失效检测是分布式系统的基本可靠性保障技术,它对运行时系统的存活状态进行及时检测.作为网络分布计算环境中的主流中间件,Web应用服务器(Web application server简称WAS)需要提供良好的检测机制,并且要能满足适应性的需求.适应性失效检测要求失效检测器能够根据应用需求和系统环境的变化而动态地改变检测的质量.首先给出了WAS的多层失效检测模型,然后基于失效检测器的服务质量规约,提出了适应性失效检测算法,并设计了一个WAS的适应性失效检测框架.它能够满足动态调整失效检测质量和灵活集成失效检测器的要求.该工作在OnceAS应用服务器中进行了实现,并给出了OnceAS平台上的实验及数据.