916 resultados para Fault tolerant computing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

"This project is funded in part by NASA grant NSG 1471."

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"UILU-ENG 80 1742"--Cover.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"August 1980."

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts. However, the research does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely ‘Intelligent Agents’. In the approach considered a task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The agents hence contribute towards fault tolerance and towards building reliable systems. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We analyse the fault-tolerant parameters and topological properties of a hierarchical network of hypercubes. We take a close look at the Extended Hypercube (EH) and the Hyperweave (HW) architectures and also compare them with other popular architectures. These two architectures have low diameter and constant degree of connectivity making it possible to expand these networks without affecting the existing configuration. A scheme for incrementally expanding this network is also presented. We also look at the performance of the ASCEND/DESCEND class of algorithms on these architectures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the advent of Internet, video over IP is gaining popularity. In such an environment, scalability and fault tolerance will be the key issues. Existing video on demand (VoD) service systems are usually neither scalable nor tolerant to server faults and hence fail to comply to multi-user, failure-prone networks such as the Internet. Current research areas concerning VoD often focus on increasing the throughput and reliability of single server, but rarely addresses the smooth provision of service during server as well as network failures. Reliable Server Pooling (RSerPool), being capable of providing high availability by using multiple redundant servers as single source point, can be a solution to overcome the above failures. During a possible server failure, the continuity of service is retained by another server. In order to achieve transparent failover, efficient state sharing is an important requirement. In this paper, we present an elegant, simple, efficient and scalable approach which has been developed to facilitate the transfer of state by the client itself, using extended cookie mechanism, which ensures that there is no noticeable change in disruption or the video quality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The time division multiple access (TDMA) based channel access mechanisms perform better than the contention based channel access mechanisms, in terms of channel utilization, reliability and power consumption, specially for high data rate applications in wireless sensor networks (WSNs). Most of the existing distributed TDMA scheduling techniques can be classified as either static or dynamic. The primary purpose of static TDMA scheduling algorithms is to improve the channel utilization by generating a schedule of smaller length. But, they usually take longer time to schedule, and hence, are not suitable for WSNs, in which the network topology changes dynamically. On the other hand, dynamic TDMA scheduling algorithms generate a schedule quickly, but they are not efficient in terms of generated schedule length. In this paper, we propose a novel scheme for TDMA scheduling in WSNs, which can generate a compact schedule similar to static scheduling algorithms, while its runtime performance can be matched with those of dynamic scheduling algorithms. Furthermore, the proposed distributed TDMA scheduling algorithm has the capability to trade-off schedule length with the time required to generate the schedule. This would allow the developers of WSNs, to tune the performance, as per the requirement of prevalent WSN applications, and the requirement to perform re-scheduling. Finally, the proposed TDMA scheduling is fault-tolerant to packet loss due to erroneous wireless channel. The algorithm has been simulated using the Castalia simulator to compare its performance with those of others in terms of generated schedule length and the time required to generate the TDMA schedule. Simulation results show that the proposed algorithm generates a compact schedule in a very less time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The proliferation of mobile computers and wireless networks requires the design of future distributed real-time applications to recognize and deal with the significant asymmetry between downstream and upstream communication capacities, and the significant disparity between server and client storage capacities. Recent research work proposed the use of Broadcast Disks as a scalable mechanism to deal with this problem. In this paper, we propose a new broadcast disks protocol, based on our Adaptive Information Dispersal Algorithm (AIDA). Our protocol is different from previous broadcast disks protocols in that it improves communication timeliness, fault-tolerance, and security, while allowing for a finer control of multiplexing of prioritized data (broadcast frequencies). We start with a general introduction of broadcast disks. Next, we propose broadcast disk organizations that are suitable for real-time applications. Next, we present AIDA and show its fault-tolerance and security properties. We conclude the paper with the description and analysis of AIDA-based broadcast disks organizations that achieve both timeliness and fault-tolerance, while preserving downstream communication capacity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, reversible logic has emerged as one of the most important approaches for power optimization with its application in low power CMOS, quantum computing and nanotechnology. Low power circuits implemented using reversible logic that provides single error correction – double error detection (SEC-DED) is proposed in this paper. The design is done using a new 4 x 4 reversible gate called ‘HCG’ for implementing hamming error coding and detection circuits. A parity preserving HCG (PPHCG) that preserves the input parity at the output bits is used for achieving fault tolerance for the hamming error coding and detection circuits.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The verification of information flow properties of security devices is difficult because it involves the analysis of schematic diagrams, artwork, embedded software, etc. In addition, a typical security device has many modes, partial information flow, and needs to be fault tolerant. We propose a new approach to the verification of such devices based upon checking abstract information flow properties expressed as graphs. This approach has been implemented in software, and successfully used to find possible paths of information flow through security devices.