843 resultados para Self-healing, autonomic computing, Web applications, fault tolerance, performance


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The complexity of systems is considered an obstacle to the progress of the IT industry. Autonomic computing is presented as the alternative to cope with the growing complexity. It is a holistic approach, in which the systems are able to configure, heal, optimize, and protect by themselves. Web-based applications are an example of systems where the complexity is high. The number of components, their interoperability, and workload variations are factors that may lead to performance failures or unavailability scenarios. The occurrence of these scenarios affects the revenue and reputation of businesses that rely on these types of applications. In this article, we present a self-healing framework for Web-based applications (SHõWA). SHõWA is composed by several modules, which monitor the application, analyze the data to detect and pinpoint anomalies, and execute recovery actions autonomously. The monitoring is done by a small aspect-oriented programming agent. This agent does not require changes to the application source code and includes adaptive and selective algorithms to regulate the level of monitoring. The anomalies are detected and pinpointed by means of statistical correlation. The data analysis detects changes in the server response time and analyzes if those changes are correlated with the workload or are due to a performance anomaly. In the presence of per- formance anomalies, the data analysis pinpoints the anomaly. Upon the pinpointing of anomalies, SHõWA executes a recovery procedure. We also present a study about the detection and localization of anomalies, the accuracy of the data analysis, and the performance impact induced by SHõWA. Two benchmarking applications, exercised through dynamic workloads, and different types of anomaly were considered in the study. The results reveal that (1) the capacity of SHõWA to detect and pinpoint anomalies while the number of end users affected is low; (2) SHõWA was able to detect anomalies without raising any false alarm; and (3) SHõWA does not induce a significant performance overhead (throughput was affected in less than 1%, and the response time delay was no more than 2 milliseconds).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Web service-based application is an architectural style, where a collection of Web services communicates to each other to execute processes. With the popularity increase of developing Web service-based application and once Web services may change, in terms of functional and non-functional Quality of Service (QoS), we need mechanisms to monitor, diagnose, and repair Web services into a Web Application. This work presents a description of self-healing architecture that deals with these mechanisms. Other contributions of this paper are using the proxy server to measure Web service QoS values and to employ some strategies to recovery the effects from misbehaved Web services. © 2008 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an analysis of the fault tolerance achieved by an autonomous, fully embedded evolvable hardware system, which uses a combination of partial dynamic reconfiguration and an evolutionary algorithm (EA). It demonstrates that the system may self-recover from both transient and cumulative permanent faults. This self-adaptive system, based on a 2D array of 16 (4×4) Processing Elements (PEs), is tested with an image filtering application. Results show that it may properly recover from faults in up to 3 PEs, that is, more than 18% cumulative permanent faults. Two fault models are used for testing purposes, at PE and CLB levels. Two self-healing strategies are also introduced, depending on whether fault diagnosis is available or not. They are based on scrubbing, fitness evaluation, dynamic partial reconfiguration and in-system evolutionary adaptation. Since most of these adaptability features are already available on the system for its normal operation, resource cost for self-healing is very low (only some code additions in the internal microprocessor core)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Service-based architectures enable the development of new classes of Grid and distributed applications. One of the main capabilities provided by such systems is the dynamic and flexible integration of services, according to which services are allowed to be a part of more than one distributed system and simultaneously serve different applications. This increased flexibility in system composition makes it difficult to address classical distributed system issues such as fault-tolerance. While it is relatively easy to make an individual service fault-tolerant, improving fault-tolerance of services collaborating in multiple application scenarios is a challenging task. In this paper, we look at the issue of developing fault-tolerant service-based distributed systems, and propose an infrastructure to implement fault tolerance capabilities transparent to services.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe a novel approach to scheduling resolution by combining Autonomic Computing (AC), Multi-Agent Systems (MAS) and Nature Inspired Optimization Techniques (NIT). Autonomic Computing has emerged as paradigm aiming at embedding applications with a management structure similar to a central nervous system. A natural Autonomic Computing evolution in relation to Current Computing is to provide systems with Self-Managing ability with a minimum human interference. In this paper we envisage the use of Multi-Agent Systems paradigm for supporting dynamic and distributed scheduling in Manufacturing Systems with Autonomic properties, in order to reduce the complexity of managing systems and human interference. Additionally, we consider the resolution of realistic problems. The scheduling of a Cutting and Treatment Stainless Steel Sheet Line will be evaluated. Results show that proposed approach has advantages when compared with other scheduling systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The new generations of SRAM-based FPGA (field programmable gate array) devices are the preferred choice for the implementation of reconfigurable computing platforms intended to accelerate processing in real-time systems. However, FPGA's vulnerability to hard and soft errors is a major weakness to robust configurable system design. In this paper, a novel built-in self-healing (BISH) methodology, based on run-time self-reconfiguration, is proposed. A soft microprocessor core implemented in the FPGA is responsible for the management and execution of all the BISH procedures. Fault detection and diagnosis is followed by repairing actions, taking advantage of the dynamic reconfiguration features offered by new FPGA families. Meanwhile, modular redundancy assures that the system still works correctly

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Applications that exploit contextual information in order to adapt their behaviour to dynamically changing operating environments and user requirements are increasingly being explored as part of the vision of pervasive or ubiquitous computing. Despite recent advances in infrastructure to support these applications through the acquisition, interpretation and dissemination of context data from sensors, they remain prohibitively difficult to develop and have made little penetration beyond the laboratory. This situation persists largely due to a lack of appropriately high-level abstractions for describing, reasoning about and exploiting context information as a basis for adaptation. In this paper, we present our efforts to address this challenge, focusing on our novel approach involving the use of preference information as a basis for making flexible adaptation decisions. We also discuss our experiences in applying our conceptual and software frameworks for context and preference modelling to a case study involving the development of an adaptive communication application.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work presents a theoretical-graph method of determining the fault tolerance degree of the computer network interconnections and nodes. Experimental results received from simulations of this method over a distributed computing network environment are also presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electric vehicles (EVs) and hybrid electric vehicles (HEVs) can reduce greenhouse gas emissions while switched reluctance motor (SRM) is one of the promising motor for such applications. This paper presents a novel SRM fault-diagnosis and fault-tolerance operation solution. Based on the traditional asymmetric half-bridge topology for the SRM driving, the central tapped winding of the SRM in modular half-bridge configuration are introduced to provide fault-diagnosis and fault-tolerance functions, which are set idle in normal conditions. The fault diagnosis can be achieved by detecting the characteristic of the excitation and demagnetization currents. An SRM fault-tolerance operation strategy is also realized by the proposed topology, which compensates for the missing phase torque under the open-circuit fault, and reduces the unbalanced phase current under the short-circuit fault due to the uncontrolled faulty phase. Furthermore, the current sensor placement strategy is also discussed to give two placement methods for low cost or modular structure. Simulation results in MATLAB/Simulink and experiments on a 750-W SRM validate the effectiveness of the proposed strategy, which may have significant implications and improve the reliability of EVs/HEVs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To boost logic density and reduce per unit power consumption SRAM-based FPGAs manufacturers adopted nanometric technologies. However, this technology is highly vulnerable to radiation-induced faults, which affect values stored in memory cells, and to manufacturing imperfections. Fault tolerant implementations, based on Triple Modular Redundancy (TMR) infrastructures, help to keep the correct operation of the circuit. However, TMR is not sufficient to guarantee the safe operation of a circuit. Other issues like module placement, the effects of multi- bit upsets (MBU) or fault accumulation, have also to be addressed. In case of a fault occurrence the correct operation of the affected module must be restored and/or the current state of the circuit coherently re-established. A solution that enables the autonomous restoration of the functional definition of the affected module, avoiding fault accumulation, re-establishing the correct circuit state in real-time, while keeping the normal operation of the circuit, is presented in this paper.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To increase the amount of logic available in SRAM-based FPGAs manufacturers are using nanometric technologies to boost logic density and reduce prices. However, nanometric scales are highly vulnerable to radiation-induced faults that affect values stored in memory cells. Since the functional definition of FPGAs relies on memory cells, they become highly prone to this type of faults. Fault tolerant implementations, based on triple modular redundancy (TMR) infrastructures, help to keep the correct operation of the circuit. However, TMR is not sufficient to guarantee the safe operation of a circuit. Other issues like the effects of multi-bit upsets (MBU) or fault accumulation, have also to be addressed. Furthermore, in case of a fault occurrence the correct operation of the affected module must be restored and the current state of the circuit coherently re-established. A solution that enables the autonomous correct restoration of the functional definition of the affected module, avoiding fault accumulation, re-establishing the correct circuit state in realtime, while keeping the normal operation of the circuit, is presented in this paper.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is imperative to accept that failures can and will occur, even in meticulously designed distributed systems, and design proper measures to counter those failures. Passive replication minimises resource consumption by only activating redundant replicas in case of failures, as typically providing and applying state updates is less resource demanding than requesting execution. However, most existing solutions for passive fault tolerance are usually designed and configured at design time, explicitly and statically identifying the most critical components and their number of replicas, lacking the needed flexibility to handle the runtime dynamics of distributed component-based embedded systems. This paper proposes a cost-effective adaptive fault tolerance solution with a significant lower overhead compared to a strict active redundancy-based approach, achieving a high error coverage with the minimum amount of redundancy. The activation of passive replicas is coordinated through a feedback-based coordination model that reduces the complexity of the needed interactions among components until a new collective global service solution is determined, improving the overall maintainability and robustness of the system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The work reported in this paper is motivated by the fact that there is a need to apply autonomic computing concepts to parallel computing systems. Advancing on prior work based on intelligent cores [36], a swarm-array computing approach, this paper focuses on ‘Intelligent agents’ another swarm-array computing approach in which the task to be executed on a parallel computing core is considered as a swarm of autonomous agents. A task is carried to a computing core by carrier agents and is seamlessly transferred between cores in the event of a predicted failure, thereby achieving self-ware objectives of autonomic computing. The feasibility of the proposed swarm-array computing approach is validated on a multi-agent simulator.