Biblioteca Digital

As the complexity of computing systems grows, reliability and energy are two crucial challenges asking for holistic solutions. In this paper, we investigate the interplay among concurrency, power dissipation, energy consumption and voltage-frequency scaling for a key numerical kernel for the solution of sparse linear systems. Concretely, we leverage a task-parallel implementation of the Conjugate Gradient method, equipped with an state-of-the-art pre-conditioner embedded in the ILUPACK software, and target a low-power multi core processor from ARM.In addition, we perform a theoretical analysis on the impact of a technique like Near Threshold Voltage Computing (NTVC) from the points of view of increased hardware concurrency and error rate.

Veja mais

Evaluating fault tolerance on asymmetric multicore systems-on-chip using iso-metrics

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The end of Dennard scaling has promoted low power consumption into a firstorder concern for computing systems. However, conventional power conservation schemes such as voltage and frequency scaling are reaching their limits when used in performance-constrained environments. New technologies are required to break the power wall while sustaining performance on future processors. Low-power embedded processors and near-threshold voltage computing (NTVC) have been proposed as viable solutions to tackle the power wall in future computing systems. Unfortunately, these technologies may also compromise per-core performance and, in the case of NTVC, xreliability. These limitations would make them unsuitable for HPC systems and datacenters. In order to demonstrate that emerging low-power processing technologies can effectively replace conventional technologies, this study relies on ARM’s big.LITTLE processors as both an actual and emulation platform, and state-of-the-art implementations of the CG solver. For NTVC in particular, the paper describes how efficient algorithm-based fault tolerance schemes preserve the power and energy benefits of very low voltage operation.

Veja mais

Central-Tapped Node Linked Modular Fault-Tolerance Topology for SRM Applications

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electric vehicles (EVs) and hybrid electric vehicles (HEVs) can reduce greenhouse gas emissions while switched reluctance motor (SRM) is one of the promising motor for such applications. This paper presents a novel SRM fault-diagnosis and fault-tolerance operation solution. Based on the traditional asymmetric half-bridge topology for the SRM driving, the central tapped winding of the SRM in modular half-bridge configuration are introduced to provide fault-diagnosis and fault-tolerance functions, which are set idle in normal conditions. The fault diagnosis can be achieved by detecting the characteristic of the excitation and demagnetization currents. An SRM fault-tolerance operation strategy is also realized by the proposed topology, which compensates for the missing phase torque under the open-circuit fault, and reduces the unbalanced phase current under the short-circuit fault due to the uncontrolled faulty phase. Furthermore, the current sensor placement strategy is also discussed to give two placement methods for low cost or modular structure. Simulation results in MATLAB/Simulink and experiments on a 750-W SRM validate the effectiveness of the proposed strategy, which may have significant implications and improve the reliability of EVs/HEVs.

Veja mais

Multi-μ: an Ada 95 based architecture for fault tolerance support of real-time systems

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an architecture (Multi-μ) being implemented to study and develop software based fault tolerant mechanisms for Real-Time Systems, using the Ada language (Ada 95) and Commercial Off-The-Shelf (COTS) components. Several issues regarding fault tolerance are presented and mechanisms to achieve fault tolerance by software active replication in Ada 95 are discussed. The Multi-μ architecture, based on a specifically proposed Fault Tolerance Manager (FTManager), is then described. Finally, some considerations are made about the work being done and essential future developments.

Veja mais

Triad views of cognition in intelligent agents and intelligent cores for fault tolerance

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Intelligent agents for fault tolerance: from multi-agent simulation to cluster-based implementation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Veja mais

Transparent Fault Tolerance for Web Services based Architectures

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Service-based architectures enable the development of new classes of Grid and distributed applications. One of the main capabilities provided by such systems is the dynamic and flexible integration of services, according to which services are allowed to be a part of more than one distributed system and simultaneously serve different applications. This increased flexibility in system composition makes it difficult to address classical distributed system issues such as fault-tolerance. While it is relatively easy to make an individual service fault-tolerant, improving fault-tolerance of services collaborating in multiple application scenarios is a challenging task. In this paper, we look at the issue of developing fault-tolerant service-based distributed systems, and propose an infrastructure to implement fault tolerance capabilities transparent to services.

Veja mais

Fault tolerance in large scale systems: hybrid and distributed approaches

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Progettazione, implementazione e valutazione di sistemi per la fault tolerance in simulazioni distribuite

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Fault Tolerance Analysis and Self-Healing Strategy of Autonomous, Evolvable Hardware Systems

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an analysis of the fault tolerance achieved by an autonomous, fully embedded evolvable hardware system, which uses a combination of partial dynamic reconfiguration and an evolutionary algorithm (EA). It demonstrates that the system may self-recover from both transient and cumulative permanent faults. This self-adaptive system, based on a 2D array of 16 (4×4) Processing Elements (PEs), is tested with an image filtering application. Results show that it may properly recover from faults in up to 3 PEs, that is, more than 18% cumulative permanent faults. Two fault models are used for testing purposes, at PE and CLB levels. Two self-healing strategies are also introduced, depending on whether fault diagnosis is available or not. They are based on scrubbing, fitness evaluation, dynamic partial reconfiguration and in-system evolutionary adaptation. Since most of these adaptability features are already available on the system for its normal operation, resource cost for self-healing is very low (only some code additions in the internal microprocessor core)

Veja mais

The Matrix Method of Determining the Fault Tolerance Degree of a Computer Network Topology

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work presents a theoretical-graph method of determining the fault tolerance degree of the computer network interconnections and nodes. Experimental results received from simulations of this method over a distributed computing network environment are also presented.

Veja mais

Central-tapped node linked modular fault-tolerance topology for SRM applications

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Veja mais

821 resultados para Fault detection, fail-safety, fault tolerance, UAV

Filtro por publicador