992 resultados para DISTRIBUTED FAILURE DETECTORS
Resumo:
Failure detection is at the core of most fault tolerance strategies, but it often depends on reliable communication. We present new algorithms for failure detectors which are appropriate as components of a fault tolerance system that can be deployed in situations of adverse network conditions (such as loosely connected and administered computing grids). It packs redundancy into heartbeat messages, thereby improving on the robustness of the traditional protocols. Results from experimental tests conducted in a simulated environment with adverse network conditions show significant improvement over existing solutions.
Resumo:
This paper is on homonymous distributed systems where processes are prone to crash failures and have no initial knowledge of the system membership (?homonymous? means that several processes may have the same identi?er). New classes of failure detectors suited to these systems are ?rst de?ned. Among them, the classes H? and H? are introduced that are the homonymous counterparts of the classes ? and ?, respectively. (Recall that the pair h?,?i de?nes the weakest failure detector to solve consensus.) Then, the paper shows how H? and H? can be implemented in homonymous systems without membership knowledge (under different synchrony requirements). Finally, two algorithms are presented that use these failure detectors to solve consensus in homonymous asynchronous systems where there is no initial knowledge ofthe membership. One algorithm solves consensus with hH?, H?i, while the other uses only H?, but needs a majority of correct processes. Observe that the systems with unique identi?ers and anonymous systems are extreme cases of homonymous systems from which follows that all these results also apply to these systems. Interestingly, the new failure detector class H? can be implemented with partial synchrony, while the analogous class A? de?ned for anonymous systems can not be implemented (even in synchronous systems). Hence, the paper provides us with the ?rst proof showing that consensus can be solved in anonymous systems with only partial synchrony (and a majority of correct processes).
Resumo:
122 p.
Resumo:
Small failures should only disrupt a small part of a network. One way to do this is by marking the surrounding area as untrustworthy --- circumscribing the failure. This can be done with a distributed algorithm using hierarchical clustering and neighbor relations, and the resulting circumscription is near-optimal for convex failures.
Resumo:
The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve several agreement problems, such as the consensus problem. In this paper, algorithms that implement failure detectors in partially synchronous systems are presented. First two simple algorithms of the weakest class to solve the consensus problem, namely the Eventually Strong class (⋄S), are presented. While the first algorithm is wait-free, the second algorithm is f-resilient, where f is a known upper bound on the number of faulty processes. Both algorithms guarantee that, eventually, all the correct processes agree permanently on a common correct process, i.e. they also implement a failure detector of the class Omega (Ω). They are also shown to be optimal in terms of the number of communication links used forever. Additionally, a wait-free algorithm that implements a failure detector of the Eventually Perfect class (⋄P) is presented. This algorithm is shown to be optimal in terms of the number of bidirectional links used forever.
Resumo:
The distributed computing models typically assume every process in the system has a distinct identifier (ID) or each process is programmed differently, which is named as eponymous system. In such kind of distributed systems, the unique ID is helpful to solve problems: it can be incorporated into messages to make them trackable (i.e., to or from which process they are sent) to facilitate the message transmission; several problems (leader election, consensus, etc.) can be solved without the information of network property in priori if processes have unique IDs; messages in the register of one process will not be overwritten by others process if this process announces; it is useful to break the symmetry. Hence, eponymous systems have influenced the distributed computing community significantly either in theory or in practice. However, every thing in the world has its own two sides. The unique ID also has disadvantages: it can leak information of the network(size); processes in the system have no privacy; assign unique ID is costly in bulk-production(e.g, sensors). Hence, homonymous system is appeared. If some processes share the same ID and programmed identically is called homonymous system. Furthermore, if all processes shared the same ID or have no ID is named as anonymous system. In homonymous or anonymous distributed systems, the symmetry problem (i.e., how to distinguish messages sent from which process) is the main obstacle in the design of algorithms. This thesis is aimed to propose different symmetry break methods (e.g., random function, counting technique, etc.) to solve agreement problem. Agreement is a fundamental problem in distributed computing including a family of abstractions. In this thesis, we mainly focus on the design of consensus, set agreement, broadcast algorithms in anonymous and homonymous distributed systems. Firstly, the fault-tolerant broadcast abstraction is studied in anonymous systems with reliable or fair lossy communication channels separately. Two classes of anonymous failure detectors AΘ and AP∗ are proposed, and both of them together with a already proposed failure detector ψ are implemented and used to enrich the system model to implement broadcast abstraction. Then, in the study of the consensus abstraction, it is proved the AΩ′ failure detector class is strictly weaker than AΩ and AΩ′ is implementable. The first implementation of consensus in anonymous asynchronous distributed systems augmented with AΩ′ and where a majority of processes does not crash. Finally, a general consensus problem– k-set agreement is researched and the weakest failure detector L used to solve it, in asynchronous message passing systems where processes may crash and recover, with homonyms (i.e., processes may have equal identities), and without a complete initial knowledge of the membership.
Resumo:
The distributed computing models typically assume every process in the system has a distinct identifier (ID) or each process is programmed differently, which is named as eponymous system. In such kind of distributed systems, the unique ID is helpful to solve problems: it can be incorporated into messages to make them trackable (i.e., to or from which process they are sent) to facilitate the message transmission; several problems (leader election, consensus, etc.) can be solved without the information of network property in priori if processes have unique IDs; messages in the register of one process will not be overwritten by others process if this process announces; it is useful to break the symmetry. Hence, eponymous systems have influenced the distributed computing community significantly either in theory or in practice. However, every thing in the world has its own two sides. The unique ID also has disadvantages: it can leak information of the network(size); processes in the system have no privacy; assign unique ID is costly in bulk-production(e.g, sensors). Hence, homonymous system is appeared. If some processes share the same ID and programmed identically is called homonymous system. Furthermore, if all processes shared the same ID or have no ID is named as anonymous system. In homonymous or anonymous distributed systems, the symmetry problem (i.e., how to distinguish messages sent from which process) is the main obstacle in the design of algorithms. This thesis is aimed to propose different symmetry break methods (e.g., random function, counting technique, etc.) to solve agreement problem. Agreement is a fundamental problem in distributed computing including a family of abstractions. In this thesis, we mainly focus on the design of consensus, set agreement, broadcast algorithms in anonymous and homonymous distributed systems. Firstly, the fault-tolerant broadcast abstraction is studied in anonymous systems with reliable or fair lossy communication channels separately. Two classes of anonymous failure detectors AΘ and AP∗ are proposed, and both of them together with a already proposed failure detector ψ are implemented and used to enrich the system model to implement broadcast abstraction. Then, in the study of the consensus abstraction, it is proved the AΩ′ failure detector class is strictly weaker than AΩ and AΩ′ is implementable. The first implementation of consensus in anonymous asynchronous distributed systems augmented with AΩ′ and where a majority of processes does not crash. Finally, a general consensus problem– k-set agreement is researched and the weakest failure detector L used to solve it, in asynchronous message passing systems where processes may crash and recover, with homonyms (i.e., processes may have equal identities), and without a complete initial knowledge of the membership.
Resumo:
In classical distributed systems, each process has a unique identity. Today, new distributed systems have emerged where a unique identity is not always possible to be assigned to each process. For example, in many sensor networks a unique identity is not possible to be included in each device due to its small storage capacity, reduced computational power, or the huge number of devices to be identified. In these cases, we have to work with anonymous distributed systems where processes cannot be identified. Consensus cannot be solved in classical and anonymous asynchronous distributed systems where processes can crash. To bypass this impossibility result, failure detectors are added to these systems. It is known that ? is the weakest failure detector class for solving consensus in classical asynchronous systems when amajority of processes never crashes. Although A? was introduced as an anonymous version of ?, to find the weakest failure detector in anonymous systems to solve consensus when amajority of processes never crashes is nowadays an open question. Furthermore, A? has the important drawback that it is not implementable. Very recently, A? has been introduced as a counterpart of ? for anonymous systems. In this paper, we show that the A? failure detector class is strictly weaker than A? (i.e., A? provides less information about process crashes than A?). We also present in this paper the first implementation of A? (hence, we also show that A? is implementable), and, finally, we include the first implementation of consensus in anonymous asynchronous systems augmented with A? and where a majority of processes does not crash.
Resumo:
硅微条探测器通过微电子工艺制作,易因沾污导致性能下降甚至失效;裸露的键合引线,也易因机械力形成隐性或显性失效。对上述现象的研究可用于修复、维护探测器并在设计和工艺流程中改进其性能。本文通过光学、电气手段分析其结构和制作工艺流程,根据沾污性质在不同条件下清洗探测器,中测后根据芯片图形、封装方式和电气要求修复探测器,最后采用同位素α能谱测试修复效果。对一块沾污后失效(无法加载偏压)的硅微条清洗后在大气环境,N面接地,P面加载负偏压条件下进行了测试,结果显示:170 V全耗尽,平均漏电流2.94μA,5.486 MeV的α峰能量分辨率约1.28%。失效键合所在条的另一面各条能谱观测到假峰,键合修复后消除。因沾污失效的硅微条探测器经过合适的清洗、修复,部分可以恢复性能,但清洗对表面和结构有损伤,须谨慎。另外,键合失效后,因信号不能引出导致的电荷积累会通过电容效应影响其它灵敏区。文章提示,探测器应存放于洁净,恒温,低湿度,避光,避强电磁干扰的环境,以提高能量和位置分辨率,并增加工作稳定性,延长使用寿命。
Resumo:
We investigate the problem of distributed sensors' failure detection in networks with a small number of defective sensors, whose measurements differ significantly from the neighbor measurements. We build on the sparse nature of the binary sensor failure signals to propose a novel distributed detection algorithm based on gossip mechanisms and on Group Testing (GT), where the latter has been used so far in centralized detection problems. The new distributed GT algorithm estimates the set of scattered defective sensors with a low complexity distance decoder from a small number of linearly independent binary messages exchanged by the sensors. We first consider networks with one defective sensor and determine the minimal number of linearly independent messages needed for its detection with high probability. We then extend our study to the multiple defective sensors detection by modifying appropriately the message exchange protocol and the decoding procedure. We show that, for small and medium sized networks, the number of messages required for successful detection is actually smaller than the minimal number computed theoretically. Finally, simulations demonstrate that the proposed method outperforms methods based on random walks in terms of both detection performance and convergence rate.
Resumo:
With the advent of cloud computing, many applications have embraced the ensuing paradigm shift towards modern distributed key-value data stores, like HBase, in order to benefit from the elastic scalability on offer. However, many applications still hesitate to make the leap from the traditional relational database model simply because they cannot compromise on the standard transactional guarantees of atomicity, isolation, and durability. To get the best of both worlds, one option is to integrate an independent transaction management component with a distributed key-value store. In this paper, we discuss the implications of this approach for durability. In particular, if the transaction manager provides durability (e.g., through logging), then we can relax durability constraints in the key-value store. However, if a component fails (e.g., a client or a key-value server), then we need a coordinated recovery procedure to ensure that commits are persisted correctly. In our research, we integrate an independent transaction manager with HBase. Our main contribution is a failure recovery middleware for the integrated system, which tracks the progress of each commit as it is flushed down by the client and persisted within HBase, so that we can recover reliably from failures. During recovery, commits that were interrupted by the failure are replayed from the transaction management log. Importantly, the recovery process does not interrupt transaction processing on the available servers. Using a benchmark, we evaluate the impact of component failure, and subsequent recovery, on application performance.
Resumo:
Background Failure to convey time-critical information to team members during surgery diminishes members’ perception of the dynamic information relevant to their task, and compromises shared situational awareness. This research reports the dialog around clinical decisions made by team members in the time-pressured and high-risk context of surgery, and the impact of these communications on shared situational awareness. Methods Fieldwork methods were used to capture the dynamic integration of individual and situational elements in surgery that provided the backdrop for clinical decisions. Nineteen semi structured interviews were performed with 24 participants from anaesthesia, surgery, and nursing in the operating rooms of a large metropolitan hospital in Queensland, Australia. Thematic analysis was used. Results: The domain “coordinating decisions in surgery” was generated from textual data. Within this domain, three themes illustrated the dialog of clinical decisions, i.e., synchronizing and strategizing actions, sharing local knowledge, and planning contingency decisions based on priority. Conclusion Strategies used to convey decisions that enhanced shared situational awareness included the use of “self-talk”, closed-loop communications, and “overhearing” conversations that occurred at the operating table. Behaviours’ that compromised a team’s shared situational awareness included tunnelling and fixating on one aspect of the situation.
Resumo:
Ambiguity resolution plays a crucial role in real time kinematic GNSS positioning which gives centimetre precision positioning results if all the ambiguities in each epoch are correctly fixed to integers. However, the incorrectly fixed ambiguities can result in large positioning offset up to several meters without notice. Hence, ambiguity validation is essential to control the ambiguity resolution quality. Currently, the most popular ambiguity validation is ratio test. The criterion of ratio test is often empirically determined. Empirically determined criterion can be dangerous, because a fixed criterion cannot fit all scenarios and does not directly control the ambiguity resolution risk. In practice, depending on the underlying model strength, the ratio test criterion can be too conservative for some model and becomes too risky for others. A more rational test method is to determine the criterion according to the underlying model and user requirement. Miss-detected incorrect integers will lead to a hazardous result, which should be strictly controlled. In ambiguity resolution miss-detected rate is often known as failure rate. In this paper, a fixed failure rate ratio test method is presented and applied in analysis of GPS and Compass positioning scenarios. A fixed failure rate approach is derived from the integer aperture estimation theory, which is theoretically rigorous. The criteria table for ratio test is computed based on extensive data simulations in the approach. The real-time users can determine the ratio test criterion by looking up the criteria table. This method has been applied in medium distance GPS ambiguity resolution but multi-constellation and high dimensional scenarios haven't been discussed so far. In this paper, a general ambiguity validation model is derived based on hypothesis test theory, and fixed failure rate approach is introduced, especially the relationship between ratio test threshold and failure rate is examined. In the last, Factors that influence fixed failure rate approach ratio test threshold is discussed according to extensive data simulation. The result shows that fixed failure rate approach is a more reasonable ambiguity validation method with proper stochastic model.
Resumo:
Background Many Australian cities experience large winter increases in deaths and hospitalisations. Flu outbreaks are only part of the problem and inadequate protection from cold weather is a key independent risk factor. Better home insulation has been shown to improve health during winter, but no study has examined whether better personal insulation improves health. Data and Methods We ran a randomised controlled trial of thermal clothing versus usual care. Subjects with heart failure (a group vulnerable to cold) were recruited from a public hospital in Brisbane in winter and followed-up at the end of winter. Those randomised to the intervention received two thermal hats and tops and a digital thermometer. The primary outcome was the number of days in hospital, with secondary outcomes of General Practitioner (GP) visits and self-rated health. Results The mean number of days in hospital per 100 winter days was 2.5 in the intervention group and 1.8 in the usual care group, with a mean difference of 0.7 (95% CI: –1.5, 5.4). The intervention group had 0.2 fewer GP visits on average (95% CI: –0.8, 0.3), and a higher self-rated health, mean improvement –0.3 (95% CI: –0.9, 0.3). The thermal tops were generally well used, but even in cold temperatures the hats were only worn by 30% of subjects. Conclusions Thermal clothes are a cheap and simple intervention, but further work needs to be done on increasing compliance and confirming the health and economic benefits of providing thermals to at-risk groups.
Resumo:
This paper presents a distributed communication based active power curtailment (APC) control scheme for grid connected photovoltaic (PV) systems to address voltage rise. A simple distribution feeder model is presented and simulated using MATLAB. The resource sharing based control scheme proposed is shown to be effective at reducing voltage rise during times of peak generation and low load. Simulations also show the even distribution of APC using simple communications. Simulations demonstrate the versatility of the proposed control method under major communication failure conditions. Further research may lead to possible applications in coordinated electric vehicle (EV) charging.