17 resultados para failure-prone systems
em Indian Institute of Science - Bangalore - Índia
Resumo:
With the advent of Internet, video over IP is gaining popularity. In such an environment, scalability and fault tolerance will be the key issues. Existing video on demand (VoD) service systems are usually neither scalable nor tolerant to server faults and hence fail to comply to multi-user, failure-prone networks such as the Internet. Current research areas concerning VoD often focus on increasing the throughput and reliability of single server, but rarely addresses the smooth provision of service during server as well as network failures. Reliable Server Pooling (RSerPool), being capable of providing high availability by using multiple redundant servers as single source point, can be a solution to overcome the above failures. During a possible server failure, the continuity of service is retained by another server. In order to achieve transparent failover, efficient state sharing is an important requirement. In this paper, we present an elegant, simple, efficient and scalable approach which has been developed to facilitate the transfer of state by the client itself, using extended cookie mechanism, which ensures that there is no noticeable change in disruption or the video quality.
Resumo:
This article presents frequentist inference of accelerated life test data of series systems with independent log-normal component lifetimes. The means of the component log-lifetimes are assumed to depend on the stress variables through a linear stress translation function that can accommodate the standard stress translation functions in the literature. An expectation-maximization algorithm is developed to obtain the maximum likelihood estimates of model parameters. The maximum likelihood estimates are then further refined by bootstrap, which is also used to infer about the component and system reliability metrics at usage stresses. The developed methodology is illustrated by analyzing a real as well as a simulated dataset. A simulation study is also carried out to judge the effectiveness of the bootstrap. It is found that in this model, application of bootstrap results in significant improvement over the simple maximum likelihood estimates.
Resumo:
The stochastic version of Pontryagin's maximum principle is applied to determine an optimal maintenance policy of equipment subject to random deterioration. The deterioration of the equipment with age is modelled as a random process. Next the model is generalized to include random catastrophic failure of the equipment. The optimal maintenance policy is derived for two special probability distributions of time to failure of the equipment, namely, exponential and Weibull distributions Both the salvage value and deterioration rate of the equipment are treated as state variables and the maintenance as a control variable. The result is illustrated by an example
Resumo:
Our main result is a new sequential method for the design of decentralized control systems. Controller synthesis is conducted on a loop-by-loop basis, and at each step the designer obtains an explicit characterization of the class C of all compensators for the loop being closed that results in closed-loop system poles being in a specified closed region D of the s-plane, instead of merely stabilizing the closed-loop system. Since one of the primary goals of control system design is to satisfy basic performance requirements that are often directly related to closed-loop pole location (bandwidth, percentage overshoot, rise time, settling time), this approach immediately allows the designer to focus on other concerns such as robustness and sensitivity. By considering only compensators from class C and seeking the optimum member of that set with respect to sensitivity or robustness, the designer has a clearly-defined limited optimization problem to solve without concern for loss of performance. A solution to the decentralized tracking problem is also provided. This design approach has the attractive features of expandability, the use of only 'local models' for controller synthesis, and fault tolerance with respect to certain types of failure.
Resumo:
In a storage system where individual storage nodes are prone to failure, the redundant storage of data in a distributed manner across multiple nodes is a must to ensure reliability. Reed-Solomon codes possess the reconstruction property under which the stored data can be recovered by connecting to any k of the n nodes in the network across which data is dispersed. This property can be shown to lead to vastly improved network reliability over simple replication schemes. Also of interest in such storage systems is the minimization of the repair bandwidth, i.e., the amount of data needed to be downloaded from the network in order to repair a single failed node. Reed-Solomon codes perform poorly here as they require the entire data to be downloaded. Regenerating codes are a new class of codes which minimize the repair bandwidth while retaining the reconstruction property. This paper provides an overview of regenerating codes including a discussion on the explicit construction of optimum codes.
Resumo:
The design optimization of laminated composites using naturally inspired optimization techniques such as vector evaluated particle swarm optimization (VEPSO) and genetic algorithms (GA) are used in this paper. The design optimization of minimum weight of the laminated composite is evaluated using different failure criteria. The failure criteria considered are maximum stress (MS), Tsai-Wu (TW) and failure mechanism based (FMB) failure criteria. Minimum weight of the laminates are obtained for different failure criteria using VEPSO and GA for different combinations of loading. From the study it is evident that VEPSO and GA predict almost the same minimum weight of the laminate for the given loading. Comparison of minimum weight of the laminates by different failure criteria differ for some loading combinations. The comparison shows that FMBFC provide better results for all combinations of loading. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
We present a framework for performance evaluation of manufacturing systems subject to failure and repair. In particular, we determine the mean and variance of accumulated production over a specified time frame and show the usefulness of these results in system design and in evaluating operational policies for manufacturing systems. We extend this analysis for lead time as well. A detailed performability study is carried out for the generic model of a manufacturing system with centralized material handling. Several numerical results are presented, and the relevance of performability analysis in resolving system design issues is highlighted. Specific problems addressed include computing the distribution of total production over a shift period, determining the shift length necessary to deliver a given production target with a desired probability, and obtaining the distribution of Manufacturing Lead Time, all in the face of potential subsystem failures.
Resumo:
Dipolar systems, both liquids and solids, constitute a class of naturally abundant systems that are important in all branches of natural science. The study of orientational relaxation provides a powerful method to understand the microscopic properties of these systems and, fortunately, there are many experimental tools to study orientational relaxation in the condensed phases. However, even after many years of intense research, our understanding of orientational relaxation in dipolar systems has remained largely imperfect. A major hurdle towards achieving a comprehensive understanding is the long range and complex nature of dipolar interactions which also made reliable theoretical study extremely difficult. These difficulties have led to the development of continuum model based theories, which although they provide simple, elegant expressions for quantities of interest, are mostly unsatisfactory as they totally neglect the molecularity of inter-molecular interactions. The situation has improved in recent years because of renewed studies, led by computer simulations. In this review, we shall address some of the recent advances, with emphasis on the work done in our laboratory at Bangalore. The reasons for the failure of the continuum model, as revealed by the recent Brownian dynamics simulations of the dipolar lattice, are discussed. The main reason is that the continuum model predicts too fast a decay of the torque-torque correlation function. On the other hand, a perturbative calculation, based on Zwanzig's projection operator technique, provides a fairly satisfactory description of the single particle orientational dynamics for not too strongly polar dipolar systems. A recently developed molecular hydrodynamic theory that properly includes the effects of intermolecular orientational pair correlations provides an even better description of the single-particle orientational dynamics. We also discuss the rank dependence of the dielectric friction. The other topics reviewed here includes dielectric relaxation and solvation dynamics, as they are intimately connected with orientational relaxation. Recent molecular dynamics simulations of the dipolar lattice are also discussed. The main theme of the present review is to understand the effects of intermolecular interactions on orientational relaxation. The presence of strong orientational pair correlation leads to a strong coupling between the single particle and the collective dynamics. This coupling can lead to rich dynamical properties, some of which are detailed here, while a major part remains yet unexplored.
Resumo:
Fork-join queueing systems offer a natural modelling paradigm for parallel processing systems and for assembly operations in automated manufacturing. The analysis of fork-join queueing systems has been an important subject of research in recent years. Existing analysis methodologies-both exact and approximate-assume that the servers are failure-free. In this study, we consider fork-join queueing systems in the presence of server failures and compute the cumulative distribution of performability with respect to the response time of such systems. For this, we employ a computational methodology that uses a recent technique based on randomization. We compare the performability of three different fork-join queueing models proposed in the literature: the distributed model, the centralized splitting model, and the split-merge model. The numerical results show that the centralized splitting model offers the highest levels of performability, followed by the distributed splitting and split-merge models.
Resumo:
A link failure in the path of a virtual circuit in a packet data network will lead to premature disconnection of the circuit by the end-points. A soft failure will result in degraded throughput over the virtual circuit. If these failures can be detected quickly and reliably, then appropriate rerouteing strategies can automatically reroute the virtual circuits that use the failed facility. In this paper, we develop a methodology for analysing and designing failure detection schemes for digital facilities. Based on errored second data, we develop a Markov model for the error and failure behaviour of a T1 trunk. The performance of a detection scheme is characterized by its false alarm probability and the detection delay. Using the Markov model, we analyse the performance of detection schemes that use physical layer or link layer information. The schemes basically rely upon detecting the occurrence of severely errored seconds (SESs). A failure is declared when a counter, that is driven by the occurrence of SESs, reaches a certain threshold.For hard failures, the design problem reduces to a proper choice;of the threshold at which failure is declared, and on the connection reattempt parameters of the virtual circuit end-point session recovery procedures. For soft failures, the performance of a detection scheme depends, in addition, on how long and how frequent the error bursts are in a given failure mode. We also propose and analyse a novel Level 2 detection scheme that relies only upon anomalies observable at Level 2, i.e. CRC failures and idle-fill flag errors. Our results suggest that Level 2 schemes that perform as well as Level 1 schemes are possible.
Resumo:
An integrated reservoir operation model is presented for developing effective operational policies for irrigation water management. In arid and semi-arid climates, owing to dynamic changes in the hydroclimatic conditions within a season, the fixed cropping pattern with conventional operating policies, may have considerable impact on the performance of the irrigation system and may affect the economics of the farming community. For optimal allocation of irrigation water in a season, development of effective mathematical models may guide the water managers in proper decision making and consequently help in reducing the adverse effects of water shortage and crop failure problems. This paper presents a multi-objective integrated reservoir operation model for multi-crop irrigation system. To solve the multi-objective model, a recent swarm intelligence technique, namely elitist-mutated multi-objective particle swarm optimisation (EM-MOPSO) has been used and applied to a case study in India. The method evolves effective strategies for irrigation crop planning and operation policies for a reservoir system, and thereby helps farming community in improving crop benefits and water resource usage in the reservoir command area.
Resumo:
This paper addresses the problem of how to select the optimal number of sensors and how to determine their placement in a given monitored area for multimedia surveillance systems. We propose to solve this problem by obtaining a novel performance metric in terms of a probability measure for accomplishing the task as a function of set of sensors and their placement. This measure is then used to find the optimal set. The same measure can be used to analyze the degradation in system 's performance with respect to the failure of various sensors. We also build a surveillance system using the optimal set of sensors obtained based on the proposed design methodology. Experimental results show the effectiveness of the proposed design methodology in selecting the optimal set of sensors and their placement.
Resumo:
A new scheme for minimizing handover failure probability in mobile cellular communication systems is presented. The scheme involves a reassignment of priorities for handover requests enqueued in adjacent cells to release a channel for a handover request which is about to fail. Performance evaluation of the new scheme carried out by computer simulation of a four-cell highway cellular system has shown a considerable reduction in handover failure probability
Resumo:
Exascale systems of the future are predicted to have mean time between failures (MTBF) of less than one hour. Malleable applications, where the number of processors on which the applications execute can be changed during executions, can make use of their malleability to better tolerate high failure rates. We present AdFT, an adaptive fault tolerance framework for long running malleable applications to maximize application performance in the presence of failures. AdFT framework includes cost models for evaluating the benefits of various fault tolerance actions including checkpointing, live-migration and rescheduling, and runtime decisions for dynamically selecting the fault tolerance actions at different points of application execution to maximize performance. Simulations with real and synthetic failure traces show that our approach outperforms existing fault tolerance mechanisms for malleable applications yielding up to 23% improvement in application performance, and is effective even for petascale systems and beyond.