140 resultados para fault tolerant systems
em Instituto Politécnico do Porto, Portugal
Resumo:
To increase the amount of logic available to the users in SRAM-based FPGAs, manufacturers are using nanometric technologies to boost logic density and reduce costs, making its use more attractive. However, these technological improvements also make FPGAs particularly vulnerable to configuration memory bit-flips caused by power fluctuations, strong electromagnetic fields and radiation. This issue is particularly sensitive because of the increasing amount of configuration memory cells needed to define their functionality. A short survey of the most recent publications is presented to support the options assumed during the definition of a framework for implementing circuits immune to bit-flips induction mechanisms in memory cells, based on a customized redundant infrastructure and on a detection-and-fix controller.
Resumo:
This paper presents an architecture (Multi-μ) being implemented to study and develop software based fault tolerant mechanisms for Real-Time Systems, using the Ada language (Ada 95) and Commercial Off-The-Shelf (COTS) components. Several issues regarding fault tolerance are presented and mechanisms to achieve fault tolerance by software active replication in Ada 95 are discussed. The Multi-μ architecture, based on a specifically proposed Fault Tolerance Manager (FTManager), is then described. Finally, some considerations are made about the work being done and essential future developments.
Resumo:
On-chip debug (OCD) features are frequently available in modern microprocessors. Their contribution to shorten the time-to-market justifies the industry investment in this area, where a number of competing or complementary proposals are available or under development, e.g. NEXUS, CJTAG, IJTAG. The controllability and observability features provided by OCD infrastructures provide a valuable toolbox that can be used well beyond the debugging arena, improving the return on investment rate by diluting its cost across a wider spectrum of application areas. This paper discusses the use of OCD features for validating fault tolerant architectures, and in particular the efficiency of various fault injection methods provided by enhanced OCD infrastructures. The reference data for our comparative study was captured on a workbench comprising the 32-bit Freescale MPC-565 microprocessor, an iSYSTEM IC3000 debugger (iTracePro version) and the Winidea 2005 debugging package. All enhanced OCD infrastructures were implemented in VHDL and the results were obtained by simulation within the same fault injection environment. The focus of this paper is on the comparative analysis of the experimental results obtained for various OCD configurations and debugging scenarios.
Resumo:
Dependability is a critical factor in computer systems, requiring high quality validation & verification procedures in the development stage. At the same time, digital devices are getting smaller and access to their internal signals and registers is increasingly complex, requiring innovative debugging methodologies. To address this issue, most recent microprocessors include an on-chip debug (OCD) infrastructure to facilitate common debugging operations. This paper proposes an enhanced OCD infrastructure with the objective of supporting the verification of fault-tolerant mechanisms through fault injection campaigns. This upgraded on-chip debug and fault injection (OCD-FI) infrastructure provides an efficient fault injection mechanism with improved capabilities and dynamic behavior. Preliminary results show that this solution provides flexibility in terms of fault triggering and allows high speed real-time fault injection in memory elements
Resumo:
As electronic devices get smaller and more complex, dependability assurance is becoming fundamental for many mission critical computer based systems. This paper presents a case study on the possibility of using the on-chip debug infrastructures present in most current microprocessors to execute real time fault injection campaigns. The proposed methodology is based on a debugger customized for fault injection and designed for maximum flexibility, and consists of injecting bit-flip type faults on memory elements without modifying or halting the target application. The debugger design is easily portable and applicable to different architectures, providing a flexible and efficient mechanism for verifying and validating fault tolerant components.
Resumo:
To increase the amount of logic available in SRAM-based FPGAs manufacturers are using nanometric technologies to boost logic density and reduce prices. However, nanometric scales are highly vulnerable to radiation-induced faults that affect values stored in memory cells. Since the functional definition of FPGAs relies on memory cells, they become highly prone to this type of faults. Fault tolerant implementations, based on triple modular redundancy (TMR) infrastructures, help to keep the correct operation of the circuit. However, TMR is not sufficient to guarantee the safe operation of a circuit. Other issues like the effects of multi-bit upsets (MBU) or fault accumulation, have also to be addressed. Furthermore, in case of a fault occurrence the correct operation of the affected module must be restored and the current state of the circuit coherently re-established. A solution that enables the autonomous correct restoration of the functional definition of the affected module, avoiding fault accumulation, re-establishing the correct circuit state in realtime, while keeping the normal operation of the circuit, is presented in this paper.
Resumo:
Fault injection is frequently used for the verification and validation of the fault tolerant features of microprocessors. This paper proposes the modification of a common on-chip debugging (OCD) infrastructure to add fault injection capabilities and improve performance. The proposed solution imposes a very low logic overhead and provides a flexible and efficient mechanism for the execution of fault injection campaigns, being applicable to different target system architectures.
Resumo:
Classical lock-based concurrency control does not scale with current and foreseen multi-core architectures, opening space for alternative concurrency control mechanisms. The concept of transactions executing concurrently in isolation with an underlying mechanism maintaining a consistent system state was already explored in fault-tolerant and distributed systems, and is currently being explored by transactional memory, this time being used to manage concurrent memory access. In this paper we discuss the use of Software Transactional Memory (STM), and how Ada can provide support for it. Furthermore, we draft a general programming interface to transactional memory, supporting future implementations of STM oriented to real-time systems.
Resumo:
Este documento descreve um modelo de tolerância a falhas para sistemas de tempo-real distribuídos. A sugestão deste modelo tem como propósito a apresentação de uma solu-ção fiável, flexível e adaptável às necessidades dos sistemas de tempo-real distribuídos. A tolerância a falhas é um aspeto extremamente importante na construção de sistemas de tempo-real e a sua aplicação traz inúmeros benefícios. Um design orientado para a to-lerância a falhas contribui para um melhor desempenho do sistema através do melhora-mento de aspetos chave como a segurança, a confiabilidade e a disponibilidade dos sis-temas. O trabalho desenvolvido centra-se na prevenção, deteção e tolerância a falhas de tipo ló-gicas (software) e físicas (hardware) e assenta numa arquitetura maioritariamente basea-da no tempo, conjugada com técnicas de redundância. O modelo preocupa-se com a efi-ciência e os custos de execução. Para isso utilizam-se também técnicas tradicionais de to-lerância a falhas, como a redundância e a migração, no sentido de não prejudicar o tempo de execução do serviço, ou seja, diminuindo o tempo de recuperação das réplicas, em ca-so de ocorrência de falhas. Neste trabalho são propostas heurísticas de baixa complexida-de para tempo-de-execução, a fim de se determinar para onde replicar os componentes que constituem o software de tempo-real e de negociá-los num mecanismo de coordena-ção por licitações. Este trabalho adapta e estende alguns algoritmos que fornecem solu-ções ainda que interrompidos. Estes algoritmos são referidos em trabalhos de investiga-ção relacionados, e são utilizados para formação de coligações entre nós coadjuvantes. O modelo proposto colmata as falhas através de técnicas de replicação ativa, tanto virtual como física, com blocos de execução concorrentes. Tenta-se melhorar ou manter a sua qualidade produzida, praticamente sem introduzir overhead de informação significativo no sistema. O modelo certifica-se que as máquinas escolhidas, para as quais os agentes migrarão, melhoram iterativamente os níveis de qualidade de serviço fornecida aos com-ponentes, em função das disponibilidades das respetivas máquinas. Caso a nova configu-ração de qualidade seja rentável para a qualidade geral do serviço, é feito um esforço no sentido de receber novos componentes em detrimento da qualidade dos já hospedados localmente. Os nós que cooperam na coligação maximizam o número de execuções para-lelas entre componentes paralelos que compõem o serviço, com o intuito de reduzir atra-sos de execução. O desenvolvimento desta tese conduziu ao modelo proposto e aos resultados apresenta-dos e foi genuinamente suportado por levantamentos bibliográficos de trabalhos de in-vestigação e desenvolvimento, literaturas e preliminares matemáticos. O trabalho tem também como base uma lista de referências bibliográficas.
Resumo:
OCEANS, 2001. MTS/IEEE Conference and Exhibition (Volume:2 )
Resumo:
To boost logic density and reduce per unit power consumption SRAM-based FPGAs manufacturers adopted nanometric technologies. However, this technology is highly vulnerable to radiation-induced faults, which affect values stored in memory cells, and to manufacturing imperfections. Fault tolerant implementations, based on Triple Modular Redundancy (TMR) infrastructures, help to keep the correct operation of the circuit. However, TMR is not sufficient to guarantee the safe operation of a circuit. Other issues like module placement, the effects of multi- bit upsets (MBU) or fault accumulation, have also to be addressed. In case of a fault occurrence the correct operation of the affected module must be restored and/or the current state of the circuit coherently re-established. A solution that enables the autonomous restoration of the functional definition of the affected module, avoiding fault accumulation, re-establishing the correct circuit state in real-time, while keeping the normal operation of the circuit, is presented in this paper.
Resumo:
It is imperative to accept that failures can and will occur, even in meticulously designed distributed systems, and design proper measures to counter those failures. Passive replication minimises resource consumption by only activating redundant replicas in case of failures, as typically providing and applying state updates is less resource demanding than requesting execution. However, most existing solutions for passive fault tolerance are usually designed and configured at design time, explicitly and statically identifying the most critical components and their number of replicas, lacking the needed flexibility to handle the runtime dynamics of distributed component-based embedded systems. This paper proposes a cost-effective adaptive fault tolerance solution with a significant lower overhead compared to a strict active redundancy-based approach, achieving a high error coverage with the minimum amount of redundancy. The activation of passive replicas is coordinated through a feedback-based coordination model that reduces the complexity of the needed interactions among components until a new collective global service solution is determined, improving the overall maintainability and robustness of the system.
Resumo:
The system grounding method option has a direct influence on the overall performance of the entire medium voltage network as well as on the ground fault current magnitude. For any kind of grounding systems: ungrounded system, solidly and low impedance grounded and resonant grounded, we can find advantages and disadvantages. A thorough study is necessary to choose the most appropriate grounding protection system. The power distribution utilities justify their choices based on economic and technical criteria, according to the specific characteristics of each distribution network. In this paper we present a medium voltage Portuguese substation case study and a study of neutral system with Petersen coil, isolated neutral and impedance grounded.
Resumo:
This paper proposes a new architecture targeting real-time and reliable Distributed Computer-Controlled Systems (DCCS). This architecture provides a structured approach for the integration of soft and/or hard real-time applications with Commercial O -The-Shelf (COTS) components. The Timely Computing Base model is used as the reference model to deal with the heterogeneity of system components with respect to guaranteeing the timeliness of applications. The reliability and availability requirements of hard real-time applications are guaranteed by a software-based fault-tolerance approach.
Resumo:
Building reliable real-time applications on top of commercial off-the-shelf (COTS) components is not a straightforward task. Thus, it is essential to provide a simple and transparent programming model, in order to abstract programmers from the low-level implementation details of distribution and replication. However, the recent trend for incorporating pre-emptive multitasking applications in reliable real-time systems inherently increases its complexity. It is therefore important to provide a transparent programming model, enabling pre-emptive multitasking applications to be implemented without resorting to simultaneously dealing with both system requirements and distribution and replication issues. The distributed embedded architecture using COTS components (DEAR-COTS) architecture has been previously proposed as an architecture to support real-time and reliable distributed computer-controlled systems (DCCS) using COTS components. Within the DEAR-COTS architecture, the hard real-time subsystem provides a framework for the development of reliable real-time applications, which are the core of DCCS applications. This paper presents the proposed framework, and demonstrates how it can be used to support the transparent replication of software components.