968 resultados para failure-prone systems


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Resource provisiomng is an important and challenging problem in the large-scale distributed systems such as Cloud computing environments. Resource management issues such as Quality of Service (QoS) further exacerbate the resource provisioning problem. Furthermore, with the increasing functionality and complexity of Cloud computing, resource failures are inevitable. Therefore, the question we address in this paper is how to provision resources to applications in the presence of resource failures in a hybrid Cloud computing environment. To this end, we propose three Cloud resource provisioning policies where we utilize workflow applications to drive the system workload. The proposed strategies take into account the workload model and the failure correlations to redirect requests to appropriate Cloud providers. Using real failure traces and workload models, we evaluated the performance and monetary cost of the proposed policies. The results of our experiments show that we can decrease the deadline violation rate of users' requests to as low as 20% with a limited cost on Amazon public Cloud.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the advent of Internet, video over IP is gaining popularity. In such an environment, scalability and fault tolerance will be the key issues. Existing video on demand (VoD) service systems are usually neither scalable nor tolerant to server faults and hence fail to comply to multi-user, failure-prone networks such as the Internet. Current research areas concerning VoD often focus on increasing the throughput and reliability of single server, but rarely addresses the smooth provision of service during server as well as network failures. Reliable Server Pooling (RSerPool), being capable of providing high availability by using multiple redundant servers as single source point, can be a solution to overcome the above failures. During a possible server failure, the continuity of service is retained by another server. In order to achieve transparent failover, efficient state sharing is an important requirement. In this paper, we present an elegant, simple, efficient and scalable approach which has been developed to facilitate the transfer of state by the client itself, using extended cookie mechanism, which ensures that there is no noticeable change in disruption or the video quality.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fire is a major disturbance process in many ecosystems world-wide, resulting in spatially and temporally dynamic landscapes. For populations occupying such environments, fire-induced landscape change is likely to influence population processes, and genetic patterns and structure among populations. The Mallee Emu-wren Stipiturus mallee is an endangered passerine whose global distribution is confined to fire-prone, semi-arid mallee shrublands in south-eastern Australia. This species, with poor capacity for dispersal, has undergone a precipitous reduction in distribution and numbers in recent decades. We used genetic analyses of 11 length-variable, nuclear loci to examine population structure and processes within this species, across its global range. Populations of the Mallee Emu-wren exhibited a low to moderate level of genetic diversity, and evidence of bottlenecks and genetic drift. Bayesian clustering methods revealed weak genetic population structure across the species' range. The direct effects of large fires, together with associated changes in the spatial and temporal patterns of suitable habitat, have the potential to cause population bottlenecks, serial local extinctions and subsequent recolonisation, all of which may interact to erode and homogenise genetic diversity in this species. Movement among temporally and spatially shifting habitat, appears to maintain long-term genetic connectivity. A plausible explanation for the observed genetic patterns is that, following extensive fires, recolonisation exceeds in-situ survival as the primary driver of population recovery in this species. These findings suggest that dynamic, fire-dominated landscapes can drive genetic homogenisation of populations of species with low-mobility and specialised habitat that otherwise would be expected to show strongly structured populations. Such effects must be considered when formulating management actions to conserve species in fire-prone systems.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Water systems in the Sultanate of Oman are inevitably exposed to varied threats and hazards due to both natural and man-made hazards. Natural disasters, especially tropical cyclone Gonu in 2007, cause immense damage to water supply systems in Oman. At the same time water loss from leaks is a major operational problem. This research developed an integrated approach to identify and rank the risks to the water sources, transmission pipelines and distribution networks in Oman and suggests appropriate mitigation measures. The system resilience was evaluated and an emergency response plan for the water supplies developed. The methodology involved mining the data held by the water supply utility for risk and resilience determination and operational data to support calculations of non-revenue water. Risk factors were identified, ranked and scored at a stakeholder workshop and the operational information required was principally gathered from interviews. Finally, an emergency response plan was developed by evaluating the risk and resilience factors. The risk analysis and assessment used a Coarse Risk Analysis (CRA) approach and risk scores were generated using a simple risk matrix based on WHO recommendations. The likelihoods and consequences of a wide range of hazardous events were identified through a key workshop and subsequent questionnaires. The thesis proposes a method of translating the detailed risk evaluations into resilience scores through a methodology used in transportation networks. A water audit indicated that the percentage of NRW in Oman is greater than 35% which is similar to other Gulf countries but high internationally. The principal strategy for managing NRW used in the research was the AWWA water audit method which includes free to use software and was found to be easy to apply in Oman. The research showed that risks to the main desalination processes can be controlled but the risk due to feed water quality might remain high even after implementing mitigation measures because the intake is close to an oil port with a significant risk of oil contamination and algal blooms. The most severe risks to transmission mains were found to be associated with pipe rather than pump failure. The systems in Oman were found to be moderately resilient, the resilience of desalination plants reasonably high but the transmission mains and pumping stations are very vulnerable. The integrated strategy developed in this study has a wide applicability, particularly in the Gulf area, which may have risks from exceptional events and will be experiencing NRW. Other developing countries may also experience such risks but with different magnitudes and the risk evaluation tables could provide a useful format for further work.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Electricity markets are complex environments with very particular characteristics. A critical issue regarding these specific characteristics concerns the constant changes they are subject to. This is a result of the electricity markets’ restructuring, which was performed so that the competitiveness could be increased, but it also had exponential implications in the increase of the complexity and unpredictability in those markets scope. The constant growth in markets unpredictability resulted in an amplified need for market intervenient entities in foreseeing market behaviour. The need for understanding the market mechanisms and how the involved players’ interaction affects the outcomes of the markets, contributed to the growth of usage of simulation tools. Multi-agent based software is particularly well fitted to analyze dynamic and adaptive systems with complex interactions among its constituents, such as electricity markets. This dissertation presents ALBidS – Adaptive Learning strategic Bidding System, a multiagent system created to provide decision support to market negotiating players. This system is integrated with the MASCEM electricity market simulator, so that its advantage in supporting a market player can be tested using cases based on real markets’ data. ALBidS considers several different methodologies based on very distinct approaches, to provide alternative suggestions of which are the best actions for the supported player to perform. The approach chosen as the players’ actual action is selected by the employment of reinforcement learning algorithms, which for each different situation, simulation circumstances and context, decides which proposed action is the one with higher possibility of achieving the most success. Some of the considered approaches are supported by a mechanism that creates profiles of competitor players. These profiles are built accordingly to their observed past actions and reactions when faced with specific situations, such as success and failure. The system’s context awareness and simulation circumstances analysis, both in terms of results performance and execution time adaptation, are complementary mechanisms, which endow ALBidS with further adaptation and learning capabilities.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the motivation of seamlessly extending wireless sensor networks to the external environment, service-oriented architecture comes up as a promising solution. However, as sensor nodes are failure prone, this consequently renders the whole wireless sensor network to seriously faulty. When a particular node is faulty, the service on it should be migrated into those substitute sensor nodes that are in a normal status. Currently, two kinds of approaches exist to identify the substitute sensor nodes: the most common approach is to prepare redundancy nodes, though the involved tasks such as maintaining redundancy nodes, i.e., relocating the new node, lead to an extra burden on the wireless sensor networks. More recently, other approaches without using redundancy nodes are emerging, and they merely select the substitute nodes in a sensor node's perspective i.e., migrating the service of faulty node to it's nearest sensor node, though usually neglecting the requirements of the application level. Even a few work consider the need of the application level, they perform at packets granularity and don't fit well at service granularity. In this paper, we aim to remove these limitations in the wireless sensor network with the service-oriented architecture. Instead of deploying redundancy nodes, the proposed mechanism replaces the faulty sensor node with consideration of the similarity on the application level, as well as on the sensor level. On the application level, we apply the Bloom Filter for its high efficiency and low space costs. While on the sensor level, we design an objective solution via the coefficient of a variation as an evaluation for choosing the substitute on the sensor level. © 2014 Springer Science+Business Media New York.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Ubiquitous computing raises new usability challenges that cut across design and development. We are particularly interested in environments enhanced with sensors, public displays and personal devices. How can prototypes be used to explore the users' mobility and interaction, both explicitly and implicitly, to access services within these environments? Because of the potential cost of development and design failure, these systems must be explored using early assessment techniques and versions of the systems that could disrupt if deployed in the target environment. These techniques are required to evaluate alternative solutions before making the decision to deploy the system on location. This is crucial for a successful development, that anticipates potential user problems, and reduces the cost of redesign. This thesis reports on the development of a framework for the rapid prototyping and analysis of ubiquitous computing environments that facilitates the evaluation of design alternatives. It describes APEX, a framework that brings together an existing 3D Application Server with a modelling tool. APEX-based prototypes enable users to navigate a virtual world simulation of the envisaged ubiquitous environment. By this means users can experience many of the features of the proposed design. Prototypes and their simulations are generated in the framework to help the developer understand how the user might experience the system. These are supported through three different layers: a simulation layer (using a 3D Application Server); a modelling layer (using a modelling tool) and a physical layer (using external devices and real users). APEX allows the developer to move between these layers to evaluate different features. It supports exploration of user experience through observation of how users might behave with the system as well as enabling exhaustive analysis based on models. The models support checking of properties based on patterns. These patterns are based on ones that have been used successfully in interactive system analysis in other contexts. They help the analyst to generate and verify relevant properties. Where these properties fail then scenarios suggested by the failure provide an important aid to redesign.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Wireless sensor networks (WSNs) may be deployed in failure-prone environments, and WSNs nodes easily fail due to unreliable wireless connections, malicious attacks and resource-constrained features. Nevertheless, if WSNs can tolerate at most losing k − 1 nodes while the rest of nodes remain connected, the network is called k − connected. k is one of the most important indicators for WSNs’ self-healing capability. Following a WSN design flow, this paper surveys resilience issues from the topology control and multi-path routing point of view. This paper provides a discussion on transmission and failure models, which have an important impact on research results. Afterwards, this paper reviews theoretical results and representative topology control approaches to guarantee WSNs to be k − connected at three different network deployment stages: pre-deployment, post-deployment and re-deployment. Multi-path routing protocols are discussed, and many NP-complete or NP-hard problems regarding topology control are identified. The challenging open issues are discussed at the end. This paper can serve as a guideline to design resilient WSNs.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

While the robots gradually become a part of our daily lives, they already play vital roles in many critical operations. Some of these critical tasks include surgeries, battlefield operations, and tasks that take place in hazardous environments or distant locations such as space missions. ^ In most of these tasks, remotely controlled robots are used instead of autonomous robots. This special area of robotics is called teleoperation. Teleoperation systems must be reliable when used in critical tasks; hence, all of the subsystems must be dependable even under a subsystem or communication line failure. ^ These systems are categorized as unilateral or bilateral teleoperation. A special type of bilateral teleoperation is described as force-reflecting teleoperation, which is further investigated as limited- and unlimited-workspace teleoperation. ^ Teleoperation systems configured in this study are tested both in numerical simulations and experiments. A new method, Virtual Rapid Robot Prototyping, is introduced to create system models rapidly and accurately. This method is then extended to configure experimental setups with actual master systems working with system models of the slave robots accompanied with virtual reality screens as well as the actual slaves. Fault-tolerant design and modeling of the master and slave systems are also addressed at different levels to prevent subsystem failure. ^ Teleoperation controllers are designed to compensate for instabilities due to communication time delays. Modifications to the existing controllers are proposed to configure a controller that is reliable in communication line failures. Position/force controllers are also introduced for master and/or slave robots. Later, controller architecture changes are discussed in order to make these controllers dependable even in systems experiencing communication problems. ^ The customary and proposed controllers for teleoperation systems are tested in numerical simulations on single- and multi-DOF teleoperation systems. Experimental studies are then conducted on seven different systems that included limited- and unlimited-workspace teleoperation to verify and improve simulation studies. ^ Experiments of the proposed controllers were successful relative to the customary controllers. Overall, by employing the fault-tolerance features and the proposed controllers, a more reliable teleoperation system is possible to design and configure which allows these systems to be used in a wider range of critical missions. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

While the robots gradually become a part of our daily lives, they already play vital roles in many critical operations. Some of these critical tasks include surgeries, battlefield operations, and tasks that take place in hazardous environments or distant locations such as space missions. In most of these tasks, remotely controlled robots are used instead of autonomous robots. This special area of robotics is called teleoperation. Teleoperation systems must be reliable when used in critical tasks; hence, all of the subsystems must be dependable even under a subsystem or communication line failure. These systems are categorized as unilateral or bilateral teleoperation. A special type of bilateral teleoperation is described as force-reflecting teleoperation, which is further investigated as limited- and unlimited-workspace teleoperation. Teleoperation systems configured in this study are tested both in numerical simulations and experiments. A new method, Virtual Rapid Robot Prototyping, is introduced to create system models rapidly and accurately. This method is then extended to configure experimental setups with actual master systems working with system models of the slave robots accompanied with virtual reality screens as well as the actual slaves. Fault-tolerant design and modeling of the master and slave systems are also addressed at different levels to prevent subsystem failure. Teleoperation controllers are designed to compensate for instabilities due to communication time delays. Modifications to the existing controllers are proposed to configure a controller that is reliable in communication line failures. Position/force controllers are also introduced for master and/or slave robots. Later, controller architecture changes are discussed in order to make these controllers dependable even in systems experiencing communication problems. The customary and proposed controllers for teleoperation systems are tested in numerical simulations on single- and multi-DOF teleoperation systems. Experimental studies are then conducted on seven different systems that included limited- and unlimited-workspace teleoperation to verify and improve simulation studies. Experiments of the proposed controllers were successful relative to the customary controllers. Overall, by employing the fault-tolerance features and the proposed controllers, a more reliable teleoperation system is possible to design and configure which allows these systems to be used in a wider range of critical missions.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper is on homonymous distributed systems where processes are prone to crash failures and have no initial knowledge of the system membership (?homonymous? means that several processes may have the same identi?er). New classes of failure detectors suited to these systems are ?rst de?ned. Among them, the classes H? and H? are introduced that are the homonymous counterparts of the classes ? and ?, respectively. (Recall that the pair h?,?i de?nes the weakest failure detector to solve consensus.) Then, the paper shows how H? and H? can be implemented in homonymous systems without membership knowledge (under different synchrony requirements). Finally, two algorithms are presented that use these failure detectors to solve consensus in homonymous asynchronous systems where there is no initial knowledge ofthe membership. One algorithm solves consensus with hH?, H?i, while the other uses only H?, but needs a majority of correct processes. Observe that the systems with unique identi?ers and anonymous systems are extreme cases of homonymous systems from which follows that all these results also apply to these systems. Interestingly, the new failure detector class H? can be implemented with partial synchrony, while the analogous class A? de?ned for anonymous systems can not be implemented (even in synchronous systems). Hence, the paper provides us with the ?rst proof showing that consensus can be solved in anonymous systems with only partial synchrony (and a majority of correct processes).

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The distributed computing models typically assume every process in the system has a distinct identifier (ID) or each process is programmed differently, which is named as eponymous system. In such kind of distributed systems, the unique ID is helpful to solve problems: it can be incorporated into messages to make them trackable (i.e., to or from which process they are sent) to facilitate the message transmission; several problems (leader election, consensus, etc.) can be solved without the information of network property in priori if processes have unique IDs; messages in the register of one process will not be overwritten by others process if this process announces; it is useful to break the symmetry. Hence, eponymous systems have influenced the distributed computing community significantly either in theory or in practice. However, every thing in the world has its own two sides. The unique ID also has disadvantages: it can leak information of the network(size); processes in the system have no privacy; assign unique ID is costly in bulk-production(e.g, sensors). Hence, homonymous system is appeared. If some processes share the same ID and programmed identically is called homonymous system. Furthermore, if all processes shared the same ID or have no ID is named as anonymous system. In homonymous or anonymous distributed systems, the symmetry problem (i.e., how to distinguish messages sent from which process) is the main obstacle in the design of algorithms. This thesis is aimed to propose different symmetry break methods (e.g., random function, counting technique, etc.) to solve agreement problem. Agreement is a fundamental problem in distributed computing including a family of abstractions. In this thesis, we mainly focus on the design of consensus, set agreement, broadcast algorithms in anonymous and homonymous distributed systems. Firstly, the fault-tolerant broadcast abstraction is studied in anonymous systems with reliable or fair lossy communication channels separately. Two classes of anonymous failure detectors AΘ and AP∗ are proposed, and both of them together with a already proposed failure detector ψ are implemented and used to enrich the system model to implement broadcast abstraction. Then, in the study of the consensus abstraction, it is proved the AΩ′ failure detector class is strictly weaker than AΩ and AΩ′ is implementable. The first implementation of consensus in anonymous asynchronous distributed systems augmented with AΩ′ and where a majority of processes does not crash. Finally, a general consensus problem– k-set agreement is researched and the weakest failure detector L used to solve it, in asynchronous message passing systems where processes may crash and recover, with homonyms (i.e., processes may have equal identities), and without a complete initial knowledge of the membership.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The distributed computing models typically assume every process in the system has a distinct identifier (ID) or each process is programmed differently, which is named as eponymous system. In such kind of distributed systems, the unique ID is helpful to solve problems: it can be incorporated into messages to make them trackable (i.e., to or from which process they are sent) to facilitate the message transmission; several problems (leader election, consensus, etc.) can be solved without the information of network property in priori if processes have unique IDs; messages in the register of one process will not be overwritten by others process if this process announces; it is useful to break the symmetry. Hence, eponymous systems have influenced the distributed computing community significantly either in theory or in practice. However, every thing in the world has its own two sides. The unique ID also has disadvantages: it can leak information of the network(size); processes in the system have no privacy; assign unique ID is costly in bulk-production(e.g, sensors). Hence, homonymous system is appeared. If some processes share the same ID and programmed identically is called homonymous system. Furthermore, if all processes shared the same ID or have no ID is named as anonymous system. In homonymous or anonymous distributed systems, the symmetry problem (i.e., how to distinguish messages sent from which process) is the main obstacle in the design of algorithms. This thesis is aimed to propose different symmetry break methods (e.g., random function, counting technique, etc.) to solve agreement problem. Agreement is a fundamental problem in distributed computing including a family of abstractions. In this thesis, we mainly focus on the design of consensus, set agreement, broadcast algorithms in anonymous and homonymous distributed systems. Firstly, the fault-tolerant broadcast abstraction is studied in anonymous systems with reliable or fair lossy communication channels separately. Two classes of anonymous failure detectors AΘ and AP∗ are proposed, and both of them together with a already proposed failure detector ψ are implemented and used to enrich the system model to implement broadcast abstraction. Then, in the study of the consensus abstraction, it is proved the AΩ′ failure detector class is strictly weaker than AΩ and AΩ′ is implementable. The first implementation of consensus in anonymous asynchronous distributed systems augmented with AΩ′ and where a majority of processes does not crash. Finally, a general consensus problem– k-set agreement is researched and the weakest failure detector L used to solve it, in asynchronous message passing systems where processes may crash and recover, with homonyms (i.e., processes may have equal identities), and without a complete initial knowledge of the membership.