90 resultados para Fault tolerance
em Chinese Academy of Sciences Institutional Repositories Grid Portal
Resumo:
Reliable messaging is a key component necessary for mobile agent systems. Current researches focus on reliable one-to-one message delivery to mobile agents. But how to implement a group communication system for mobile agents remains an open issue, which is a powerful block that facilitates the development of fault-tolerant mobile agent systems. In this paper, we propose a group communication system for mobile agents (GCS-MA), which includes totally ordered multicast and membership management functions. We divide a group of mobile agents into several agent clusters,and each agent cluster consists of all mobile agents residing in the same sub-network and is managed by a special module, named coordinator. Then, all coordinators form a ring-based overlay for interchanging messages between clusters. We present a token-based algorithm, an intra-cluster messaging algorithm and an inter-cluster migration algorithm to achieve atomicity and total ordering properties of multicast messages, by building a membership protocol on top of the clustering and failure detection mechanisms. Performance issues of the proposed system have been analysed through simulations. We also describe the application of the proposed system in the context of the service cooperation middleware (SCM) project.
Resumo:
随着计算机芯片的速度不断提升,器件的门限电压越来越低,因此单粒子翻转的瞬时故障越来越容易发生。特别是在太空环境中的计算机系统,在宇宙射线的影响下,瞬时故障更为频繁,系统可靠性面临更突出的考验。 为了提高计算机系统的可靠性,一般有硬件冗余容错和软件冗余容错两种方法。相对硬件容错而言,软件容错的优点是价格便宜,性价比高,配置灵活等,缺点是会带来额外的时间和空间开销,而且给程序员带来编写额外的容错代码的工作量。近来出现了一些基于编译的软件容错方法,可在编译的过程中自动加入冗余容错逻辑,但是这类编译容错方法仍然会带来显著的时间空间开销。如何在保持容错能力的同时尽量降低时空开销,是有待继续研究的问题。 本文在编译容错方向上进行了进一步研究和实现,提出利用源代码中的变量信息对冗余容错逻辑进行了剪裁,在保证容错能力的同时降低了时空开销,对内存和寄存器中的数据进行保护。具体内容有: 1. 提出了一个容错编译环境SCC的设计蓝图,构建了一个容错编译工具的远 景目标。 2. 提出了一种指令级的编译容错检测方法VarBIFT ,提供检测瞬时故障的能力。平均只利用0.0069倍的时间损耗和0.3620倍的空间损耗就将发生瞬时故障时,程序正确执行和检测到故障的概率总和平均从39.1%提升到76.9%, 3. 提出了一种指令级的编译容错恢复方法VarRIFT ,提供从瞬时故障中恢复正确数据的能力。平均只增加0.043倍的时间损耗和0.69倍的空间损耗就将发生瞬时故障时,程序仍然正确执行的概率平均从44.8%提升到了78.7%。 4. 基于开源编译器LCC,实现了上述两个编译容错方法VarBIFT 和VarRIFT 。在容错方法的实现中只修改了跟具体CPU指令相独立的中间逻辑,所以这两个实现能够方便得移植到SPARC、MIPS等其他CPU架构上。 5. 开发了一个故障注入工具,并用它测试了上述两个编译容错方法VarBIFT和VarRIFT 的容错能力。
Resumo:
为了解决空间辐射对嵌入式计算机系统正确性的影响越来越明显的问题,基于典型的编译级容错技术,在编译器LCC上实现了基于有向无环图的编译级容错检测方法VarBIFT。该方法可以有效的保护由于粒子效应所引起的瞬时硬件故障,并可针对不同的目标机自动生成容错代码。实验结果表明,VarBIFT使源程序的平均段错误率从32.3%降到了13.9%,平均错误输出率从28.6%降到了9.2%;而其时间开销和空间开销仅为0.7%和36%。
Resumo:
提出一种抵抗瞬时故障的自动编译容错恢复方法,用源码中的变量信息在指令级别进行冗余错误流裁剪,在LCC上加以实现,并获得良好的容错性能。实验结果表明,该方法仅增加0.043倍的时间损耗及0.69倍的空间损耗,在时空损耗上优于现有的其他方法。
Resumo:
介绍了一种利用集群技术实现双机容错的开发方法.通过对集群技术运行机制的深入研究,提出了采用基于"层"模式的双机容错系统技术方案,在普通PC服务器上实现了双机容错系统,分析了该系统的可用性,针对电力综合自动系统的结构特点,对心跳侦测等功能进行了改进,并在一套小型变电站自动化系统上进行了实验验证,能够较好的满足中小型电力综合自动化系统的需求.
Resumo:
提出了一种基于信道估计的RS纠错编码改进算法,该算法可以自适应地根据外界条件和环境对传输信道的干扰变化实时地调节编码系统的数据冗余量。仿真与完整的分析结果证实了该改进算法有效地改善了RS编码算法的传输效率;并且通过实际应用表明:良好的性能,高容错性适应于该通信系统的多种传输信道,具有很强的实用性。
Resumo:
设计了基于CAN协议的AUV内部通讯总线系统 .系统通过协议转换器的模块化、可配置性设计满足AUV系统对其内部通讯总线的开放性要求 .协议转换器内部的容错处理能力以及紧急事件处理节点的设计为增强AUV系统的可靠性和容错能力、为避免AUV在深海工作环境下丢失增加有力的保障措施
Resumo:
Deformation twins and stacking faults have been observed in nanocrystal line Ni, for the first time under uniaxial tensile test conditions. These partial dislocation mediated deformation mechanisms are enhanced at cryogenic test temperatures. Our observations highlight the effects of deformation conditions, temperature in particular, on deformation mechanisms in nanograins.
Resumo:
Generalized planar fault energy (GPFE) curves have been used to predict partial-dislocation-mediated processes in nanocrystalline materials, but their validity has not been evaluated experimentally. We report experimental observations of a large quantity of both stacking faults and twins in nc Ni deformed at relatively low stresses in a tensile test. The experimental findings indicate that the GPFE curves can reasonably explain the formation of stacking faults, but they alone were not able to adequately predict the propensity of deformation twinning.
Resumo:
The stress release model, a stochastic version of the elastic rebound theory, is applied to the large events from four synthetic earthquake catalogs generated by models with various levels of disorder in distribution of fault zone strength (Ben-Zion, 1996) They include models with uniform properties (U), a Parkfield-type asperity (A), fractal brittle properties (F), and multi-size-scale heterogeneities (M). The results show that the degree of regularity or predictability in the assumed fault properties, based on both the Akaike information criterion and simulations, follows the order U, F, A, and M, which is in good agreement with that obtained by pattern recognition techniques applied to the full set of synthetic data. Data simulated from the best fitting stress release models reproduce, both visually and in distributional terms, the main features of the original catalogs. The differences in character and the quality of prediction between the four cases are shown to be dependent on two main aspects: the parameter controlling the sensitivity to departures from the mean stress level and the frequency-magnitude distribution, which differs substantially between the four cases. In particular, it is shown that the predictability of the data is strongly affected by the form of frequency-magnitude distribution, being greatly reduced if a pure Gutenburg-Richter form is assumed to hold out to high magnitudes.