64 resultados para Fault detection, fail-safety, fault tolerance, UAV


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively detect computing core failures and take action to relocate the computing core's job onto reliable cores can make a significant step towards automating fault tolerance. Method: This paper describes an experimental investigation into the use of multi-agent approaches for fault tolerance. Two approaches are studied, the first at the job level and the second at the core level. The approaches are investigated for single core failure scenarios that can occur in the execution of parallel reduction algorithms on computer clusters. A third approach is proposed that incorporates multi-agent technology both at the job and core level. Experiments are pursued in the context of genome searching, a popular computational biology application. Result: The key conclusion is that the approaches proposed are feasible for automating fault tolerance in high-performance computing systems with minimal human intervention. In a typical experiment in which the fault tolerance is studied, centralised and decentralised checkpointing approaches on an average add 90% to the actual time for executing the job. On the other hand, in the same experiment the multi-agent approaches add only 10% to the overall execution time

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the complexity of computing systems grows, reliability and energy are two crucial challenges asking for holistic solutions. In this paper, we investigate the interplay among concurrency, power dissipation, energy consumption and voltage-frequency scaling for a key numerical kernel for the solution of sparse linear systems. Concretely, we leverage a task-parallel implementation of the Conjugate Gradient method, equipped with an state-of-the-art pre-conditioner embedded in the ILUPACK software, and target a low-power multi core processor from ARM.In addition, we perform a theoretical analysis on the impact of a technique like Near Threshold Voltage Computing (NTVC) from the points of view of increased hardware concurrency and error rate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The end of Dennard scaling has promoted low power consumption into a firstorder concern for computing systems. However, conventional power conservation schemes such as voltage and frequency scaling are reaching their limits when used in performance-constrained environments. New technologies are required to break the power wall while sustaining performance on future processors. Low-power embedded processors and near-threshold voltage computing (NTVC) have been proposed as viable solutions to tackle the power wall in future computing systems. Unfortunately, these technologies may also compromise per-core performance and, in the case of NTVC, xreliability. These limitations would make them unsuitable for HPC systems and datacenters. In order to demonstrate that emerging low-power processing technologies can effectively replace conventional technologies, this study relies on ARM’s big.LITTLE processors as both an actual and emulation platform, and state-of-the-art implementations of the CG solver. For NTVC in particular, the paper describes how efficient algorithm-based fault tolerance schemes preserve the power and energy benefits of very low voltage operation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electric vehicles (EVs) and hybrid electric vehicles (HEVs) can reduce greenhouse gas emissions while switched reluctance motor (SRM) is one of the promising motor for such applications. This paper presents a novel SRM fault-diagnosis and fault-tolerance operation solution. Based on the traditional asymmetric half-bridge topology for the SRM driving, the central tapped winding of the SRM in modular half-bridge configuration are introduced to provide fault-diagnosis and fault-tolerance functions, which are set idle in normal conditions. The fault diagnosis can be achieved by detecting the characteristic of the excitation and demagnetization currents. An SRM fault-tolerance operation strategy is also realized by the proposed topology, which compensates for the missing phase torque under the open-circuit fault, and reduces the unbalanced phase current under the short-circuit fault due to the uncontrolled faulty phase. Furthermore, the current sensor placement strategy is also discussed to give two placement methods for low cost or modular structure. Simulation results in MATLAB/Simulink and experiments on a 750-W SRM validate the effectiveness of the proposed strategy, which may have significant implications and improve the reliability of EVs/HEVs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, the authors have presented one approach to configuring a Wafer-Scale Integration Chip. The approach described is called the 'WINNER', in which bus channels and an external controller for configuring the working processors are not required. In addition, the technique is applicable to high availability systems constructed using conventional methods. The technique can also be extended to arrays of arbitrary size and with any degree of fault tolerance simply by using an appropriate number of cells.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper game theory is used to analyse the effect of a number of service failures during the execution of a grid orchestration. A service failure may be catastrophic in that it causes an entire orchestration to fail. Alternatively, a grid manager may utilise alternative services in the case of failure, allowing an orchestration to recover, A risk profile provides a means of modelling situations in a way that is neither overly optimistic nor overly pessimistic. Risk profiles are analysed using angel and daemon games. A risk profile can be assigned a valuation through an analysis of the structure of its associated Nash equilibria. Some structural properties of valuation functions, that show their validity as a measure for risk, are given. Two main cases are considered, the assessment of Orc expressions and the arrangement of a meeting using reputations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reliability has emerged as a critical design constraint especially in memories. Designers are going to great lengths to guarantee fault free operation of the underlying silicon by adopting redundancy-based techniques, which essentially try to detect and correct every single error. However, such techniques come at a cost of large area, power and performance overheads which making many researchers to doubt their efficiency especially for error resilient systems where 100% accuracy is not always required. In this paper, we present an alternative method focusing on the confinement of the resulting output error induced by any reliability issues. By focusing on memory faults, rather than correcting every single error the proposed method exploits the statistical characteristics of any target application and replaces any erroneous data with the best available estimate of that data. To realize the proposed method a RISC processor is augmented with custom instructions and special-purpose functional units. We apply the method on the proposed enhanced processor by studying the statistical characteristics of the various algorithms involved in a popular multimedia application. Our experimental results show that in contrast to state-of-the-art fault tolerance approaches, we are able to reduce runtime and area overhead by 71.3% and 83.3% respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

WebCom-G is a fledgling Grid Operating System, designed to provide independent service access through interoperability with existing middlewares. It offers an expressive programming model that automatically handles task synchronisation – load balancing, fault tolerance, and task allocation are handled at the WebCom-G system level – without burdening the application writer. These characteristics, together with the ability of its computing model to mix evaluation strategies to match the characteristics of the geographically dispersed facilities and the overall problem- solving environment, make WebCom-G a promising grid middleware candidate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We show how the architecture of two recently reported bit-level systolic array circuits - a single-bit coefficient correlator and a multibit convolver - may be modified to incorporate unidirectional data flow. This feature has advantages in terms of chip cascadability, fault tolerance and possible wafer-scale integration.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Com- puting (HPC) storage systems, which are at the forefront of handling the data del- uge, typically employ hardware RAID at the backend. However, such solutions are costly, do not ensure end-to-end data integrity, and can become a bottleneck during data reconstruction. In this paper, we design an innovative solution to achieve a flex- ible, fault-tolerant, and high-performance RAID-6 solution for a parallel file system (PFS). Our system utilizes low-cost, strategically placed GPUs — both on the client and server sides — to accelerate parity computation. In contrast to hardware-based approaches, we provide full control over the size, length and location of a RAID array on a per file basis, end-to-end data integrity checking, and parallelization of RAID array reconstruction. We have deployed our system in conjunction with the widely-used Lustre PFS, and show that our approach is feasible and imposes ac- ceptable overhead.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electric vehicles (EVs) and hybrid EVs are the way forward for green transportation and for establishing low-carbon economy. This paper presents a split converter-fed four-phase switched reluctance motor (SRM) drive to realize flexible integrated charging functions (dc and ac sources). The machine is featured with a central-tapped winding node, eight stator slots, and six rotor poles (8/6). In the driving mode, the developed topology has the same characteristics as the traditional asymmetric bridge topology but better fault tolerance. The proposed system supports battery energy balance and on-board dc and ac charging. When connecting with an ac power grid, the proposed topology has a merit of the multilevel converter; the charging current control can be achieved by the improved hysteresis control. The energy flow between the two batteries is balanced by the hysteresis control based on their state-of-charge conditions. Simulation results in MATLAB/Simulink and experiments on a 150-W prototype SRM validate the effectiveness of the proposed technologies, which may provide a solution to EV charging issues associated with significant infrastructure requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bridge weigh-in-motion (B-WIM), a system that uses strain sensors to calculate the weights of trucks passing on bridges overhead, requires accurate axle location and speed information for effective performance. The success of a B-WIM system is dependent upon the accuracy of the axle detection method. It is widely recognised that any form of axle detector on the road surface is not ideal for B-WIM applications as it can cause disruption to the traffic (Ojio & Yamada 2002; Zhao et al. 2005; Chatterjee et al. 2006). Sensors under the bridge, that is Nothing-on-Road (NOR) B-WIM, can perform axle detection via data acquisition systems which can detect a peak in strain as the axle passes. The method is often successful, although not all bridges are suitable for NOR B-WIM due to limitations of the system. Significant research has been carried out to further develop the method and the NOR algorithms, but beam-and-slab bridges with deep beams still present a challenge. With these bridges, the slabs are used for axle detection, but peaks in the slab strains are sensitive to the transverse position of wheels on the beam. This next generation B-WIM research project extends the current B-WIM algorithm to the problem of axle detection and safety, thus overcoming the existing limitations in current state-of–the-art technology. Finite Element Analysis was used to determine the critical locations for axle detecting sensors and the findings were then tested in the field. In this paper, alternative strategies for axle detection were determined using Finite Element analysis and the findings were then tested in the field. The site selected for testing was in Loughbrickland, Northern Ireland, along the A1 corridor connecting the two cities of Belfast and Dublin. The structure is on a central route through the island of Ireland and has a high traffic volume which made it an optimum location for the study. Another huge benefit of the chosen location was its close proximity to a nearby self-operated weigh station. To determine the accuracy of the proposed B-WIM system and develop a knowledge base of the traffic load on the structure, a pavement WIM system was also installed on the northbound lane on the approach to the structure. The bridge structure selected for this B-WIM research comprised of 27 pre-cast prestressed concrete Y4-beams, and a cast in-situ concrete deck. The structure, a newly constructed integral bridge, spans 19 m and has an angle of skew of 22.7°.