32 resultados para service-oriented grid computing systems
em Indian Institute of Science - Bangalore - Índia
Resumo:
In this paper we present the design of ``e-SURAKSHAK,'' a novel cyber-physical health care management system of Wireless Embedded Internet Devices (WEIDs) that sense vital health parameters. The system is capable of sensing body temperature, heart rate, oxygen saturation level and also allows noninvasive blood pressure (NIBP) measurement. End to end internet connectivity is provided by using 6LoWPAN based wireless network that uses the 802.15.4 radio. A service oriented architecture (SOA) 1] is implemented to extract meaningful information and present it in an easy-to-understand form to the end-user instead of raw data made available by sensors. A central electronic database and health care management software are developed. Vital health parameters are measured and stored periodically in the database. Further, support for real-time measurement of health parameters is provided through a web based GUI. The system has been implemented completely and demonstrated with multiple users and multiple WEIDs.
Resumo:
In this paper, we present a decentralized dynamic load scheduling/balancing algorithm called ELISA (Estimated Load Information Scheduling Algorithm) for general purpose distributed computing systems. ELISA uses estimated state information based upon periodic exchange of exact state information between neighbouring nodes to perform load scheduling. The primary objective of the algorithm is to cut down on the communication and load transfer overheads by minimizing the frequency of status exchange and by restricting the load transfer and status exchange within the buddy set of a processor. It is shown that the resulting algorithm performs almost as well as a perfect information algorithm and is superior to other load balancing schemes based on the random sharing and Ni-Hwang algorithms. A sensitivity analysis to study the effect of various design parameters on the effectiveness of load balancing is also carried out. Finally, the algorithm's performance is tested on large dimensional hypercubes in the presence of time-varying load arrival process and is shown to perform well in comparison to other algorithms. This makes ELISA a viable and implementable load balancing algorithm for use in general purpose distributed computing systems.
Resumo:
Simultaneous consideration of both performance and reliability issues is important in the choice of computer architectures for real-time aerospace applications. One of the requirements for such a fault-tolerant computer system is the characteristic of graceful degradation. A shared and replicated resources computing system represents such an architecture. In this paper, a combinatorial model is used for the evaluation of the instruction execution rate of a degradable, replicated resources computing system such as a modular multiprocessor system. Next, a method is presented to evaluate the computation reliability of such a system utilizing a reliability graph model and the instruction execution rate. Finally, this computation reliability measure, which simultaneously describes both performance and reliability, is applied as a constraint in an architecture optimization model for such computing systems. Index Terms-Architecture optimization, computation
Resumo:
This paper is aimed at reviewing the notion of Byzantine-resilient distributed computing systems, the relevant protocols and their possible applications as reported in the literature. The three agreement problems, namely, the consensus problem, the interactive consistency problem, and the generals problem have been discussed. Various agreement protocols for the Byzantine generals problem have been summarized in terms of their performance and level of fault-tolerance. The three classes of Byzantine agreement protocols discussed are the deterministic, randomized, and approximate agreement protocols. Finally, application of the Byzantine agreement protocols to clock synchronization is highlighted.
Resumo:
Effectiveness evaluation of aerospace fault-tolerant computing systems used in a phased-mission environment is rather tricky and difficult because of the interaction of its several degraded performance levels with the multiple objectives of the mission and the use environment. Part I uses an approach based on multiobjective phased-mission analysis to evaluate the effectiveness of a distributed avionics architecture used in a transport aircraft. Part II views the computing system as a multistate s-coherent structure. Lower bounds on the probabilities of accomplishing various levels of performance are evaluated.
Resumo:
An important issue in the design of a distributed computing system (DCS) is the development of a suitable protocol. This paper presents an effort to systematize the protocol design procedure for a DCS. Protocol design and development can be divided into six phases: specification of the DCS, specification of protocol requirements, protocol design, specification and validation of the designed protocol, performance evaluation, and hardware/software implementation. This paper describes techniques for the second and third phases, while the first phase has been considered by the authors in their earlier work. Matrix and set theoretic based approaches are used for specification of a DCS and for specification of the protocol requirements. These two formal specification techniques form the basis of the development of a simple and straightforward procedure for the design of the protocol. The applicability of the above design procedure has been illustrated by considering an example of a computing system encountered on board a spacecraft. A Petri-net based approach has been adopted to model the protocol. The methodology developed in this paper can be used in other DCS applications.
Resumo:
Precision, sophistication and economic factors in many areas of scientific research that demand very high magnitude of compute power is the order of the day. Thus advance research in the area of high performance computing is getting inevitable. The basic principle of sharing and collaborative work by geographically separated computers is known by several names such as metacomputing, scalable computing, cluster computing, internet computing and this has today metamorphosed into a new term known as grid computing. This paper gives an overview of grid computing and compares various grid architectures. We show the role that patterns can play in architecting complex systems, and provide a very pragmatic reference to a set of well-engineered patterns that the practicing developer can apply to crafting his or her own specific applications. We are not aware of pattern-oriented approach being applied to develop and deploy a grid. There are many grid frameworks that are built or are in the process of being functional. All these grids differ in some functionality or the other, though the basic principle over which the grids are built is the same. Despite this there are no standard requirements listed for building a grid. The grid being a very complex system, it is mandatory to have a standard Software Architecture Specification (SAS). We attempt to develop the same for use by any grid user or developer. Specifically, we analyze the grid using an object oriented approach and presenting the architecture using UML. This paper will propose the usage of patterns at all levels (analysis. design and architectural) of the grid development.
Resumo:
Service systems are labor intensive. Further, the workload tends to vary greatly with time. Adapting the staffing levels to the workloads in such systems is nontrivial due to a large number of parameters and operational variations, but crucial for business objectives such as minimal labor inventory. One of the central challenges is to optimize the staffing while maintaining system steady-state and compliance to aggregate SLA constraints. We formulate this problem as a parametrized constrained Markov process and propose a novel stochastic optimization algorithm for solving it. Our algorithm is a multi-timescale stochastic approximation scheme that incorporates a SPSA based algorithm for ‘primal descent' and couples it with a ‘dual ascent' scheme for the Lagrange multipliers. We validate this optimization scheme on five real-life service systems and compare it with a state-of-the-art optimization tool-kit OptQuest. Being two orders of magnitude faster than OptQuest, our scheme is particularly suitable for adaptive labor staffing. Also, we observe that it guarantees convergence and finds better solutions than OptQuest in many cases.
Resumo:
Grid-connected systems when put to use at the site would experience scenarios like voltage sag, voltage swell, frequency deviations and unbalance which are common in the real world grid. When these systems are tested at laboratory, these scenarios do not exist and an almost stiff voltage source is what is usually seen. But, to qualify the grid-connected systems to operate at the site, it becomes essential to test them under the grid conditions mentioned earlier. The grid simulator is a hardware that can be programmed to generate some of the typical conditions experienced by the grid-connected systems at site. It is an inverter that is controlled to act like a voltage source in series with a grid impedance. The series grid impedance is emulated virtually within the inverter control rather than through physical components, thus avoiding the losses and the need for bulky reactive components. This paper describes the design of a grid simulator. Control implementation issues are highlighted in the experimental results.
Resumo:
A number of companies are trying to migrate large monolithic software systems to Service Oriented Architectures. A common approach to do this is to first identify and describe desired services (i.e., create a model), and then to locate portions of code within the existing system that implement the described services. In this paper we describe a detailed case study we undertook to match a model to an open-source business application. We describe the systematic methodology we used, the results of the exercise, as well as several observations that throw light on the nature of this problem. We also suggest and validate heuristics that are likely to be useful in partially automating the process of matching service descriptions to implementations.
Resumo:
The move towards IT outsourcing is the first step towards an environment where compute infrastructure is treated as a service. In utility computing this IT service has to honor Service Level Agreements (SLA) in order to meet the desired Quality of Service (QoS) guarantees. Such an environment requires reliable services in order to maximize the utilization of the resources and to decrease the Total Cost of Ownership (TCO). Such reliability cannot come at the cost of resource duplication, since it increases the TCO of the data center and hence the cost per compute unit. We, in this paper, look into aspects of projecting impact of hardware failures on the SLAs and techniques required to take proactive recovery steps in case of a predicted failure. By maintaining health vectors of all hardware and system resources, we predict the failure probability of resources based on observed hardware errors/failure events, at runtime. This inturn influences an availability aware middleware to take proactive action (even before the application is affected in case the system and the application have low recoverability). The proposed framework has been prototyped on a system running HP-UX. Our offline analysis of the prediction system on hardware error logs indicate no more than 10% false positives. This work to the best of our knowledge is the first of its kind to perform an end-to-end analysis of the impact of a hardware fault on application SLAs, in a live system.
Resumo:
Distributed computing systems can be modeled adequately by Petri nets. The computation of invariants of Petri nets becomes necessary for proving the properties of modeled systems. This paper presents a two-phase, bottom-up approach for invariant computation and analysis of Petri nets. In the first phase, a newly defined subnet, called the RP-subnet, with an invariant is chosen. In the second phase, the selected RP-subnet is analyzed. Our methodology is illustrated with two examples viz., the dining philosophers' problem and the connection-disconnection phase of a transport protocol. We believe that this new method, which is computationally no worse than the existing techniques, would simplify the analysis of many practical distributed systems.
Resumo:
Many next-generation distributed applications, such as grid computing, require a single source to communicate with a group of destinations. Traditionally, such applications are implemented using multicast communication. A typical multicast session requires creating the shortest-path tree to a fixed number of destinations. The fundamental issue in multicasting data to a fixed set of destinations is receiver blocking. If one of the destinations is not reachable, the entire multicast request (say, grid task request) may fail. Manycasting is a generalized variation of multicasting that provides the freedom to choose the best subset of destinations from a larger set of candidate destinations. We propose an impairment-aware algorithm to provide manycasting service in the optical layer, specifically OBS. We compare the performance of our proposed manycasting algorithm with traditional multicasting and multicast with over provisioning. Our results show a significant improvement in the blocking probability by implementing optical-layer manycasting.
Resumo:
Long running multi-physics coupled parallel applications have gained prominence in recent years. The high computational requirements and long durations of simulations of these applications necessitate the use of multiple systems of a Grid for execution. In this paper, we have built an adaptive middleware framework for execution of long running multi-physics coupled applications across multiple batch systems of a Grid. Our framework, apart from coordinating the executions of the component jobs of an application on different batch systems, also automatically resubmits the jobs multiple times to the batch queues to continue and sustain long running executions. As the set of active batch systems available for execution changes, our framework performs migration and rescheduling of components using a robust rescheduling decision algorithm. We have used our framework for improving the application throughput of a foremost long running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our real multi-site experiments with CCSM indicate that Grid executions can lead to improved application throughput for climate models.