988 resultados para FAILURE DETECTION


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Máster Universitario en Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il video streaming in peer-to-peer sta diventando sempre più popolare e utiliz- zato. Per tali applicazioni i criteri di misurazione delle performance sono: - startup delay: il tempo che intercorre tra la connessione e l’inizio della ripro- duzione dello stream (chiamato anche switching delay), - playback delay: il tempo che intercorre tra l’invio da parte della sorgente e la riproduzione dello stream da parte di un peer, - time lag: la differenza tra i playback delay di due diversi peer. Tuttavia, al giorno d’oggi i sistemi P2P per il video streaming sono interessati da considerevoli ritardi, sia nella fase di startup che in quella di riproduzione. Un recente studio su un famoso sistema P2P per lo streaming, ha mostrato che solitamente i ritardi variano tra i 10 e i 60 secondi. Gli autori hanno osservato anche che in alcuni casi i ritardi superano i 4 minuti! Si tratta quindi di gravi inconvenienti se si vuole assistere a eventi in diretta o se si vuole fruire di applicazioni interattive. Alcuni studi hanno mostrato che questi ritardi sono la conseguenza della natura non strutturata di molti sistemi P2P. Ogni stream viene suddiviso in blocchi che vengono scambiati tra i peer. A causa della diffusione non strutturata del contenuto, i peer devono continuamente scambiare informazioni con i loro vicini prima di poter inoltrare i blocchi ricevuti. Queste soluzioni sono estremamente re- sistenti ai cambiamenti della rete, ma comportano una perdita notevole in termini di prestazioni, rendendo complicato raggiungere l’obiettivo di un broadcast in realtime. In questo progetto abbiamo lavorato su un sistema P2P strutturato per il video streaming che ha mostrato di poter offrire ottimi risultati con ritardi molto vicini a quelli ottimali. In un sistema P2P strutturato ogni peer conosce esattamente quale blocchi inviare e a quali peer. Siccome il numero di peer che compongono il sistema potrebbe essere elevato, ogni peer dovrebbe operare possedendo solo una conoscenza limitata dello stato del sistema. Inoltre il sistema è in grado di gestire arrivi e partenze, anche raggruppati, richiedendo una riorganizzazione limitata della struttura. Infine, in questo progetto abbiamo progettato e implementato una soluzione personalizzata per rilevare e sostituire i peer non più in grado di cooperare. Anche per questo aspetto, l’obiettivo è stato quello di minimizzare il numero di informazioni scambiate tra peer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We investigate the problem of distributed sensors' failure detection in networks with a small number of defective sensors, whose measurements differ significantly from the neighbor measurements. We build on the sparse nature of the binary sensor failure signals to propose a novel distributed detection algorithm based on gossip mechanisms and on Group Testing (GT), where the latter has been used so far in centralized detection problems. The new distributed GT algorithm estimates the set of scattered defective sensors with a low complexity distance decoder from a small number of linearly independent binary messages exchanged by the sensors. We first consider networks with one defective sensor and determine the minimal number of linearly independent messages needed for its detection with high probability. We then extend our study to the multiple defective sensors detection by modifying appropriately the message exchange protocol and the decoding procedure. We show that, for small and medium sized networks, the number of messages required for successful detection is actually smaller than the minimal number computed theoretically. Finally, simulations demonstrate that the proposed method outperforms methods based on random walks in terms of both detection performance and convergence rate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The popularity of MapReduce programming model has increased interest in the research community for its improvement. Among the other directions, the point of fault tolerance, concretely the failure detection issue seems to be a crucial one, but that until now has not reached its satisfying level. Motivated by this, I decided to devote my main research during this period into having a prototype system architecture of MapReduce framework with a new failure detection service, containing both analytical (theoretical) and implementation part. I am confident that this work should lead the way for further contributions in detecting failures to any NoSQL App frameworks, and cloud storage systems in general.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background The etiology of most premature ovarian failure (POF) cases is usually elusive. Although genetic causes clearly exist and a likely susceptible region of 8q22.3 has been discovered, no predominant explanation exists for POF. More recently, evidences have indicated that mutations in NR5A1 gene could be causative for POF. We therefore screened for mutations in the NR5A1 gene in a large cohort of Chinese women with non-syndromic POF. Methods Mutation screening of NR5A1 gene was performed in 400 Han Chinese women with well-defined 46,XX idiopathic non-syndromic POF and 400 controls. Subsequently, functional characterization of the novel mutation identified was evaluated in vitro. Results A novel heterozygous missense mutation [c.13T>G (p.Tyr5Asp)] in NR5A1 was identified in 1 of 384 patients (0.26%). This mutation impaired transcriptional activation on Amh, Inhibin-a, Cyp11a1and Cyp19a1 gene, as shown by transactivation assays. However, no dominant negative effect was observed, nor was there impact on protein expression and nuclear localization. Conclusions This novel mutation p.Tyr5Asp, in a novel non-domain region, is presumed to result in haploinsufficiency. Irrespectively, perturbation in NR5A1 is not a common explanation for POF in Chinese.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diffusion on networks is a convenient framework to describe transport systems of different nature (from biological transport systems to urban mobility). The mathematical models are based on master equations that describe the diffusion processes by means of the weighted Laplacian matrix that connects the nodes. The link weight represent the coupling strength between the nodes. In this thesis we cope with the problem of localizing a single-edge failure that occurs in the network. An edge failure is meant to be as a sudden decrease of its transport capacities. An incomplete observation of the dynamical state of the network is available. An optimal clustering procedure based on the correlation properties among the node states is proposed. The network dimensionality is then reduced introducing representative nodes for each cluster, whose dynamical state is observed. We check the efficiency of the failure localization for our clustering method in comparison with more traditional techniques, using different graph configurations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Failure detection is at the core of most fault tolerance strategies, but it often depends on reliable communication. We present new algorithms for failure detectors which are appropriate as components of a fault tolerance system that can be deployed in situations of adverse network conditions (such as loosely connected and administered computing grids). It packs redundancy into heartbeat messages, thereby improving on the robustness of the traditional protocols. Results from experimental tests conducted in a simulated environment with adverse network conditions show significant improvement over existing solutions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Insulated gate bipolar transistor (IGBT) modules are important safety critical components in electrical power systems. Bond wire lift-off, a plastic deformation between wire bond and adjacent layers of a device caused by repeated power/thermal cycles, is the most common failure mechanism in IGBT modules. For the early detection and characterization of such failures, it is important to constantly detect or monitor the health state of IGBT modules, and the state of bond wires in particular. This paper introduces eddy current pulsed thermography (ECPT), a nondestructive evaluation technique, for the state detection and characterization of bond wire lift-off in IGBT modules. After the introduction of the experimental ECPT system, numerical simulation work is reported. The presented simulations are based on the 3-D electromagnetic-thermal coupling finite-element method and analyze transient temperature distribution within the bond wires. This paper illustrates the thermal patterns of bond wires using inductive heating with different wire statuses (lifted-off or well bonded) under two excitation conditions: nonuniform and uniform magnetic field excitations. Experimental results show that uniform excitation of healthy bonding wires, using a Helmholtz coil, provides the same eddy currents on each, while different eddy currents are seen on faulty wires. Both experimental and numerical results show that ECPT can be used for the detection and characterization of bond wires in power semiconductors through the analysis of the transient heating patterns of the wires. The main impact of this paper is that it is the first time electromagnetic induction thermography, so-called ECPT, has been employed on power/electronic devices. Because of its capability of contactless inspection of multiple wires in a single pass, and as such it opens a wide field of investigation in power/electronic devices for failure detection, performance characterization, and health monitoring.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O trabalho apresentado nesta dissertação refere-se à concepção, projecto e realização experimental de um conversor estático de potência tolerante a falhas. Foram analisados trabalhos de investigação sobre modos de falha de conversores electrónicos de potência, topologias de conversores tolerantes a falhas, métodos de detecção de falhas, entre outros. Com vista à concepção de uma solução, foram nomeados e analisados os principais modos de falhas para três soluções propostas de conversores com topologias tolerantes a falhas onde existem elementos redundantes em modo de espera. Foram analisados os vários aspectos de natureza técnica dos circuitos de potência e guiamento de sinais onde se salientam a necessidade de tempos mortos entre os sinais de disparo de IGBT do mesmo ramo, o isolamento galvânico entre os vários andares de disparo, a necessidade de minimizar as auto-induções entre o condensador DC e os braços do conversor de potência. Com vista a melhorar a fiabilidade e segurança de funcionamento do conversor estático de potência tolerante a falhas, foi concebido um circuito electrónico permitindo a aceleração da actuação normal de contactores e outro circuito responsável pelo encaminhamento e inibição dos sinais de disparo. Para a aplicação do conversor estático de potência tolerante a falhas desenvolvido num accionamento com um motor de corrente contínua, foi implementado um algoritmo de controlo numa placa de processamento digital de sinais (DSP), sendo a supervisão e actuação do sistema realizados em tempo-real, para a detecção de falhas e actuação de contactores e controlo de corrente e velocidade do motor utilizando uma estratégia de comando PWM. Foram realizados ensaios que, mediante uma detecção adequada de falhas, realiza a comutação entre blocos de conversores de potência. São apresentados e discutidos resultados experimentais, obtidos usando o protótipo laboratorial.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes a multifunctional architecture to implement field-programmable gate array (FPGA) controllers for power converters and presents a prototype for a pulsed power generator based on a solid-state Marx topology. The massively parallel nature of reconfigurable hardware platforms provides very high processing power and fast response times allowing the implementation of many subsystems in the same device. The prototype includes the controller, a failure detection system, an interface with a safety/emergency subsystem, a graphical user interface, and a virtual oscilloscope to visualize the generated pulse waveforms, using a single FPGA. The proposed architecture employs a modular design that can be easily adapted to other power converter topologies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The present research aims at contributing to the area of detection and diagnosis of failure through the proposal of a new system architecture of detection and isolation of failures (FDI, Fault Detection and Isolation). The proposed architecture presents innovations related to the way the physical values monitored are linked to the FDI system and, as a consequence, the way the failures are detected, isolated and classified. A search for mathematical tools able to satisfy the objectives of the proposed architecture has pointed at the use of the Kalman Filter and its derivatives EKF (Extended Kalman Filter) and UKF (Unscented Kalman Filter). The use of the first one is efficient when the monitored process presents a linear relation among its physical values to be monitored and its out-put. The other two are proficient in case this dynamics is no-linear. After that, a short comparative of features and abilities in the context of failure detection concludes that the UFK system is a better alternative than the EKF one to compose the architecture of the FDI system proposed in case of processes of no-linear dynamics. The results shown in the end of the research refer to the linear and no-linear industrial processes. The efficiency of the proposed architecture may be observed since it has been applied to simulated and real processes. To conclude, the contributions of this thesis are found in the end of the text

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Los sistemas de tiempo real tienen un papel cada vez más importante en nuestra sociedad. Constituyen un componente fundamental de los sistemas de control, que a su vez forman parte de diversos sistemas de ingeniería básicos en actividades industriales, militares, de comunicaciones, espaciales y médicas. La planificación de recursos es un problema fundamental en la realización de sistemas de tiempo real. Su objetivo es asignar los recursos disponibles a las tareas de forma que éstas cumplan sus restricciones temporales. Durante bastante tiempo, el estado de la técnica en relación con los métodos de planificación ha sido rudimentario. En la actualidad, los métodos de planificación basados en prioridades han alcanzado un nivel de madurez suficiente para su aplicación en entornos industriales. Sin embargo, hay cuestiones abiertas que pueden dificultar su utilización. El objetivo principal de esta tesis es estudiar los métodos de planificación basados en prioridades, detectar las cuestiones abiertas y desarrollar protocolos, directrices y esquemas de realización práctica que faciliten su empleo en sistemas industriales. Una cuestión abierta es la carencia de esquemas de realización de algunos protocolos con núcleos normalizados. El resultado ha sido el desarrollo de esquemas de realización de tareas periódicas y esporádicas de tiempo real, con detección de fallos de temporización, comunicación entre tareas, cambio de modo de ejecución del sistema y tratamiento de fallos mediante grupos de recuperación. Los esquemas se han codificado en Ada 9X y se proporcionan directrices para analizar la planificabilidad de un sistema desarrollado con esta base. Un resultado adicional ha sido la identificación de la funcionalidad mínima necesaria para desarrollar sistemas de tiempo real con las características enumeradas. La capacidad de adaptación a los cambios del entorno es una característica deseable de los sistemas de tiempo real. Si estos cambios no estaban previstos en la fase de diseño o si hay módulos erróneos, es necesario modificar o incluir algunas tareas. La actualización del sistema se suele realizar estáticamente y su instalación se lleva a cabo después de parar su ejecución. Sin embargo, hay sistemas cuyo funcionamiento no se puede detener sin producir daños materiales o económicos. Una alternativa es diseñar el sistema como un conjunto de unidades que se pueden reemplazar, sin interferir con la ejecución de otras unidades. Para tal fin, se ha desarrollado un protocolo de reemplazamiento dinámico para sistemas de tiempo real crítico y se ha comprobado su compatibilidad con los métodos de planificación basados en prioridades. Finalmente se ha desarrollado un esquema de realización práctica del protocolo.---ABSTRACT---Real-time systems are very important now a days. They have become a relevant issue in the design of control systems, which are a basic component of several engineering systems in industrial, telecommunications, military, spatial and medical applications. Resource scheduling is a central issue in the development of real-time systems. Its purpose is to assign the available resources to the tasks, in such a way that their deadlines are met. Historically, hand-crafted techniques were used to develop real-time systems. Recently, the priority-based scheduling methods have reached a sufficient maturity level to be feasible its extensive use in industrial applications. However, there are some open questions that may decrease its potential usefulness. The main goal of this thesis is to study the priority-based scheduling methods, to identify the remaining open questions and to develop protocols, implementation templates and guidelines that will make more feasible its use in industrial applications. One open question is the lack of implementation schemes, based on commercial realtime kernels, of some of the protocols. POSIX and Ada 9X has served to identify the services usually available. A set of implementation templates for periodic and sporadic tasks have been developed with provisión for timing failure detection, intertask coraraunication, change of the execution mode and failure handling based on recovery groups. Those templates have been coded in Ada 9X. A set of guidelines for checking the schedulability of a system based on them are also provided. An additional result of this work is the identification of the minimal functionality required to develop real-time systems based on priority scheduling methods, with the above characteristics. A desirable feature of real-time systems is their capacity to adapt to changes in the environment, that cannot be entirely predicted during the design, or to misbehaving software modules. The traditional maintenance techniques are performed by stopping the whole system, installing the new application and finally resuming the system execution. However this approach cannot be applied to non-stop systems. An alternative is to design the system as a set of software units that can be dynamically replaced within its operative environment. With this goal in mind, a dynamic replacement protocol for hard real-time systems has been defined. Its compatibility with priority-based scheduling methods has been proved. Finally, a execution témplate of the protocol has been implemented.