998 resultados para DISTRIBUTED FAILURE DETECTORS
Resumo:
Failure detection is at the core of most fault tolerance strategies, but it often depends on reliable communication. We present new algorithms for failure detectors which are appropriate as components of a fault tolerance system that can be deployed in situations of adverse network conditions (such as loosely connected and administered computing grids). It packs redundancy into heartbeat messages, thereby improving on the robustness of the traditional protocols. Results from experimental tests conducted in a simulated environment with adverse network conditions show significant improvement over existing solutions.
Resumo:
This paper is on homonymous distributed systems where processes are prone to crash failures and have no initial knowledge of the system membership (?homonymous? means that several processes may have the same identi?er). New classes of failure detectors suited to these systems are ?rst de?ned. Among them, the classes H? and H? are introduced that are the homonymous counterparts of the classes ? and ?, respectively. (Recall that the pair h?,?i de?nes the weakest failure detector to solve consensus.) Then, the paper shows how H? and H? can be implemented in homonymous systems without membership knowledge (under different synchrony requirements). Finally, two algorithms are presented that use these failure detectors to solve consensus in homonymous asynchronous systems where there is no initial knowledge ofthe membership. One algorithm solves consensus with hH?, H?i, while the other uses only H?, but needs a majority of correct processes. Observe that the systems with unique identi?ers and anonymous systems are extreme cases of homonymous systems from which follows that all these results also apply to these systems. Interestingly, the new failure detector class H? can be implemented with partial synchrony, while the analogous class A? de?ned for anonymous systems can not be implemented (even in synchronous systems). Hence, the paper provides us with the ?rst proof showing that consensus can be solved in anonymous systems with only partial synchrony (and a majority of correct processes).
Resumo:
The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve several agreement problems, such as the consensus problem. In this paper, algorithms that implement failure detectors in partially synchronous systems are presented. First two simple algorithms of the weakest class to solve the consensus problem, namely the Eventually Strong class (⋄S), are presented. While the first algorithm is wait-free, the second algorithm is f-resilient, where f is a known upper bound on the number of faulty processes. Both algorithms guarantee that, eventually, all the correct processes agree permanently on a common correct process, i.e. they also implement a failure detector of the class Omega (Ω). They are also shown to be optimal in terms of the number of communication links used forever. Additionally, a wait-free algorithm that implements a failure detector of the Eventually Perfect class (⋄P) is presented. This algorithm is shown to be optimal in terms of the number of bidirectional links used forever.
Resumo:
The distributed computing models typically assume every process in the system has a distinct identifier (ID) or each process is programmed differently, which is named as eponymous system. In such kind of distributed systems, the unique ID is helpful to solve problems: it can be incorporated into messages to make them trackable (i.e., to or from which process they are sent) to facilitate the message transmission; several problems (leader election, consensus, etc.) can be solved without the information of network property in priori if processes have unique IDs; messages in the register of one process will not be overwritten by others process if this process announces; it is useful to break the symmetry. Hence, eponymous systems have influenced the distributed computing community significantly either in theory or in practice. However, every thing in the world has its own two sides. The unique ID also has disadvantages: it can leak information of the network(size); processes in the system have no privacy; assign unique ID is costly in bulk-production(e.g, sensors). Hence, homonymous system is appeared. If some processes share the same ID and programmed identically is called homonymous system. Furthermore, if all processes shared the same ID or have no ID is named as anonymous system. In homonymous or anonymous distributed systems, the symmetry problem (i.e., how to distinguish messages sent from which process) is the main obstacle in the design of algorithms. This thesis is aimed to propose different symmetry break methods (e.g., random function, counting technique, etc.) to solve agreement problem. Agreement is a fundamental problem in distributed computing including a family of abstractions. In this thesis, we mainly focus on the design of consensus, set agreement, broadcast algorithms in anonymous and homonymous distributed systems. Firstly, the fault-tolerant broadcast abstraction is studied in anonymous systems with reliable or fair lossy communication channels separately. Two classes of anonymous failure detectors AΘ and AP∗ are proposed, and both of them together with a already proposed failure detector ψ are implemented and used to enrich the system model to implement broadcast abstraction. Then, in the study of the consensus abstraction, it is proved the AΩ′ failure detector class is strictly weaker than AΩ and AΩ′ is implementable. The first implementation of consensus in anonymous asynchronous distributed systems augmented with AΩ′ and where a majority of processes does not crash. Finally, a general consensus problem– k-set agreement is researched and the weakest failure detector L used to solve it, in asynchronous message passing systems where processes may crash and recover, with homonyms (i.e., processes may have equal identities), and without a complete initial knowledge of the membership.
Resumo:
The distributed computing models typically assume every process in the system has a distinct identifier (ID) or each process is programmed differently, which is named as eponymous system. In such kind of distributed systems, the unique ID is helpful to solve problems: it can be incorporated into messages to make them trackable (i.e., to or from which process they are sent) to facilitate the message transmission; several problems (leader election, consensus, etc.) can be solved without the information of network property in priori if processes have unique IDs; messages in the register of one process will not be overwritten by others process if this process announces; it is useful to break the symmetry. Hence, eponymous systems have influenced the distributed computing community significantly either in theory or in practice. However, every thing in the world has its own two sides. The unique ID also has disadvantages: it can leak information of the network(size); processes in the system have no privacy; assign unique ID is costly in bulk-production(e.g, sensors). Hence, homonymous system is appeared. If some processes share the same ID and programmed identically is called homonymous system. Furthermore, if all processes shared the same ID or have no ID is named as anonymous system. In homonymous or anonymous distributed systems, the symmetry problem (i.e., how to distinguish messages sent from which process) is the main obstacle in the design of algorithms. This thesis is aimed to propose different symmetry break methods (e.g., random function, counting technique, etc.) to solve agreement problem. Agreement is a fundamental problem in distributed computing including a family of abstractions. In this thesis, we mainly focus on the design of consensus, set agreement, broadcast algorithms in anonymous and homonymous distributed systems. Firstly, the fault-tolerant broadcast abstraction is studied in anonymous systems with reliable or fair lossy communication channels separately. Two classes of anonymous failure detectors AΘ and AP∗ are proposed, and both of them together with a already proposed failure detector ψ are implemented and used to enrich the system model to implement broadcast abstraction. Then, in the study of the consensus abstraction, it is proved the AΩ′ failure detector class is strictly weaker than AΩ and AΩ′ is implementable. The first implementation of consensus in anonymous asynchronous distributed systems augmented with AΩ′ and where a majority of processes does not crash. Finally, a general consensus problem– k-set agreement is researched and the weakest failure detector L used to solve it, in asynchronous message passing systems where processes may crash and recover, with homonyms (i.e., processes may have equal identities), and without a complete initial knowledge of the membership.
Resumo:
Documento submetido para revisão pelos pares. A publicar em Journal of Parallel and Distributed Computing. ISSN 0743-7315
Resumo:
In classical distributed systems, each process has a unique identity. Today, new distributed systems have emerged where a unique identity is not always possible to be assigned to each process. For example, in many sensor networks a unique identity is not possible to be included in each device due to its small storage capacity, reduced computational power, or the huge number of devices to be identified. In these cases, we have to work with anonymous distributed systems where processes cannot be identified. Consensus cannot be solved in classical and anonymous asynchronous distributed systems where processes can crash. To bypass this impossibility result, failure detectors are added to these systems. It is known that ? is the weakest failure detector class for solving consensus in classical asynchronous systems when amajority of processes never crashes. Although A? was introduced as an anonymous version of ?, to find the weakest failure detector in anonymous systems to solve consensus when amajority of processes never crashes is nowadays an open question. Furthermore, A? has the important drawback that it is not implementable. Very recently, A? has been introduced as a counterpart of ? for anonymous systems. In this paper, we show that the A? failure detector class is strictly weaker than A? (i.e., A? provides less information about process crashes than A?). We also present in this paper the first implementation of A? (hence, we also show that A? is implementable), and, finally, we include the first implementation of consensus in anonymous asynchronous systems augmented with A? and where a majority of processes does not crash.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Electrotécnica e de Computadores
Resumo:
We analyze the failure process of a two-component system with widely different fracture strength in the framework of a fiber bundle model with localized load sharing. A fraction 0≤α≤1 of the bundle is strong and it is represented by unbreakable fibers, while fibers of the weak component have randomly distributed failure strength. Computer simulations revealed that there exists a critical composition αc which separates two qualitatively different behaviors: Below the critical point, the failure of the bundle is brittle, characterized by an abrupt damage growth within the breakable part of the system. Above αc, however, the macroscopic response becomes ductile, providing stability during the entire breaking process. The transition occurs at an astonishingly low fraction of strong fibers which can have importance for applications. We show that in the ductile phase, the size distribution of breaking bursts has a power law functional form with an exponent μ=2 followed by an exponential cutoff. In the brittle phase, the power law also prevails but with a higher exponent μ=92. The transition between the two phases shows analogies to continuous phase transitions. Analyzing the microstructure of the damage, it was found that at the beginning of the fracture process cracks nucleate randomly, while later on growth and coalescence of cracks dominate, which give rise to power law distributed crack sizes.
Resumo:
Magdeburg, Univ., Fak. für Mathematik, Diss., 2010
Resumo:
We investigate the problem of distributed sensors' failure detection in networks with a small number of defective sensors, whose measurements differ significantly from the neighbor measurements. We build on the sparse nature of the binary sensor failure signals to propose a novel distributed detection algorithm based on gossip mechanisms and on Group Testing (GT), where the latter has been used so far in centralized detection problems. The new distributed GT algorithm estimates the set of scattered defective sensors with a low complexity distance decoder from a small number of linearly independent binary messages exchanged by the sensors. We first consider networks with one defective sensor and determine the minimal number of linearly independent messages needed for its detection with high probability. We then extend our study to the multiple defective sensors detection by modifying appropriately the message exchange protocol and the decoding procedure. We show that, for small and medium sized networks, the number of messages required for successful detection is actually smaller than the minimal number computed theoretically. Finally, simulations demonstrate that the proposed method outperforms methods based on random walks in terms of both detection performance and convergence rate.
Resumo:
With the advent of cloud computing, many applications have embraced the ensuing paradigm shift towards modern distributed key-value data stores, like HBase, in order to benefit from the elastic scalability on offer. However, many applications still hesitate to make the leap from the traditional relational database model simply because they cannot compromise on the standard transactional guarantees of atomicity, isolation, and durability. To get the best of both worlds, one option is to integrate an independent transaction management component with a distributed key-value store. In this paper, we discuss the implications of this approach for durability. In particular, if the transaction manager provides durability (e.g., through logging), then we can relax durability constraints in the key-value store. However, if a component fails (e.g., a client or a key-value server), then we need a coordinated recovery procedure to ensure that commits are persisted correctly. In our research, we integrate an independent transaction manager with HBase. Our main contribution is a failure recovery middleware for the integrated system, which tracks the progress of each commit as it is flushed down by the client and persisted within HBase, so that we can recover reliably from failures. During recovery, commits that were interrupted by the failure are replayed from the transaction management log. Importantly, the recovery process does not interrupt transaction processing on the available servers. Using a benchmark, we evaluate the impact of component failure, and subsequent recovery, on application performance.
Resumo:
Due to manufacturing or damage process, brittle materials present a large number of micro-cracks which are randomly distributed. The lifetime of these materials is governed by crack propagation under the applied mechanical and thermal loadings. In order to deal with these kinds of materials, the present work develops a boundary element method (BEM) model allowing for the analysis of multiple random crack propagation in plane structures. The adopted formulation is based on the dual BEM, for which singular and hyper-singular integral equations are used. An iterative scheme to predict the crack growth path and crack length increment is proposed. This scheme enables us to simulate the localization and coalescence phenomena, which are the main contribution of this paper. Considering the fracture mechanics approach, the displacement correlation technique is applied to evaluate the stress intensity factors. The propagation angle and the equivalent stress intensity factor are calculated using the theory of maximum circumferential stress. Examples of multi-fractured domains, loaded up to rupture, are considered to illustrate the applicability of the proposed method. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Purpose: To test the hypothesis that ruptured abdominal aortic aneurysms (AAA) are globally weaker than unruptured ones. Methods: Four ruptured and seven unruptured AAA specimens were harvested whole from fresh cadavers during autopsies performed over an 18-month period. Multiple regionally distributed longitudinally oriented rectangular strips were cut from each AAA specimen for a total of 77 specimen strips. Strips were subjected to uniaxial extension until failure. Sections from approximately the strongest and weakest specimen strips were studied histologically and histochemically. From the load-extension data, failure tension, failure stress and failure strain were calculated. Rupture site characteristics such as location, arc length of rupture and orientation of rupture were also documented. Results: The failure tension, a measure of the tissue mechanical caliber was remarkably similar between ruptured and unruptured AAA (group mean +/- standard deviation of within-subject means: 11.2 +/- 2.3 versus 11.6 +/- 3.6 N/cin; p=0.866 by mixed model ANOVA). In post-hoc analysis, there was little difference between the groups in other measures of tissue mechanical caliber as well such as failure stress (95 +/- 28 versus 98 +/- 23 N/cm(2); p=0.870), failure strain (0.39 +/- 0.09 versus 0.36 +/- 0.09; p=0.705), wall thickness (1.7 +/- 0.4 versus 1.5 +/- 0.4 mm; p=0.470), and % coverage of collagen within tissue cross section (49.6 +/- 12.9% versus 60.8 +/- 9.6%; p=0.133). In the four ruptured AAA, primary rupture sites were on the lateral quadrants (two on left; one on left-posterior; one on right). Remarkably, all rupture lines had a longitudinal orientation and ranged from 1 to 6 cm in length. Conclusion: The findings are not consistent with the hypothesis that ruptured aortic aneurysms are globally weaker than unruptured ones. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Background Heart failure and diabetes often occur simultaneously in patients, but the prognostic value of glycemia in chronic heart failure is debatable. We evaluated the role of glycemia on prognosis of heart failure. Methods Outpatients with chronic heart failure from the Long-term Prospective Randomized Controlled Study Using Repetitive Education at Six-Month Intervals and Monitoring for Adherence in Heart Failure Outpatients (REMADHE) trial were grouped according to the presence of diabetes and level of glycemia. All-cause mortality/heart transplantation and unplanned hospital admission were evaluated. Results Four hundred fifty-six patients were included (135 [29.5%] female, 124 [27.2%] with diabetes mellitus, age of w50.2 +/- 11.4 years, and left-ventricle ejection fraction of 34.7% +/- 10.5%). During follow-up (3.6 +/- 2.2 years), 27 (5.9%) patients were submitted to heart transplantation and 202 (44.2%) died; survival was similar in patients with and without diabetes mellitus. When patients with and without diabetes were categorized according to glucose range (glycemia <= 100 mg/dL [5.5 mmol/L]), as well as when distributed in quintiles of glucose, the survival was significantly worse among patients with lower levels of glycemia. This finding persisted in Cox proportional hazards regression model that included gender, etiology, left ventricle ejection fraction, left ventricle diastolic diameter, creatinine level and beta-blocker therapy, and functional status (hazard ratio 1.45, 95% CI 1.09-1.69, P = .039). No difference regarding unplanned hospital admission was found. Conclusion We report on an inverse association between glycemia and mortality in outpatients with chronic heart failure. These results point to a new pathophysiologic understanding of the interactions between diabetes mellitus, hyperglycemia, and heart disease. (Am Heart J 2010; 159: 90-7.)