8 resultados para Hazard, Interactive Failure, Repairable Systems, Asset Maintenance
em Universidad Politécnica de Madrid
Resumo:
In this paper we focus on the selection of safeguards in a fuzzy risk analysis and management methodology for information systems (IS). Assets are connected by dependency relationships, and a failure of one asset may affect other assets. After computing impact and risk indicators associated with previously identified threats, we identify and apply safeguards to reduce risks in the IS by minimizing the transmission probabilities of failures throughout the asset network. However, as safeguards have associated costs, the aim is to select the safeguards that minimize costs while keeping the risk within acceptable levels. To do this, we propose a dynamic programming-based method that incorporates simulated annealing to tackle optimizations problems.
Resumo:
La fisuración iniciada en la superficie de los pavimentos asfálticos constituye uno de los más frecuentes e importantes modos de deterioro que tienen lugar en los firmes bituminosos, como han demostrado los estudios teóricos y experimentales llevados a cabo en la última década. Sin embargo, este mecanismo de fallo no ha sido considerado por los métodos tradicionales de diseño de estos firmes. El concepto de firmes de larga duración se fundamenta en un adecuado seguimiento del proceso de avance en profundidad de estos deterioros y la intervención en el momento más apropiado para conseguir mantenerlos confinados como fisuras de profundidad parcial en la capa superficial más fácilmente accesible y reparable, de manera que pueda prolongarse la durabilidad y funcionalidad del firme y reducir los costes generalizados de su ciclo de vida. Por lo tanto, para la selección de la estrategia óptima de conservación de los firmes resulta esencial disponer de metodologías que posibiliten la identificación precisa in situ de la fisuración descendente, su seguimiento y control, y que además permitan una determinación fiable y con alto rendimiento de su profundidad y extensión. En esta Tesis Doctoral se presentan los resultados obtenidos mediante la investigación sistemática de laboratorio e in situ llevada a cabo para la obtención de datos sobre fisuración descendente en firmes asfálticos y para el estudio de procedimientos de evaluación de la profundidad de este tipo de fisuras empleando técnicas de ultrasonidos. Dichos resultados han permitido comprobar que la metodología no destructiva propuesta, de rápida ejecución, bajo coste y sencilla implementación (principalmente empleada hasta el momento en estructuras metálicas y de hormigón, debido a las dificultades que introduce la naturaleza viscoelástica de los materiales bituminosos) puede ser aplicada con suficiente fiabilidad y repetibilidad sobre firmes asfálticos. Las medidas resultan asimismo independientes del espesor total del firme. Además, permite resolver algunos de los inconvenientes frecuentes que presentan otros métodos de diagnóstico de las fisuras de pavimentos, tales como la extracción de testigos (sistema destructivo, de alto coste y prolongados tiempos de interrupción del tráfico) o algunas otras técnicas no destructivas como las basadas en medidas de deflexiones o el georradar, las cuales no resultan suficientemente precisas para la investigación de fisuras superficiales. Para ello se han realizado varias campañas de ensayos sobre probetas de laboratorio en las que se han estudiado diferentes condiciones empíricas como, por ejemplo, distintos tipos de mezclas bituminosas en caliente (AC, SMA y PA), espesores de firme y adherencias entre capas, temperaturas, texturas superficiales, materiales de relleno y agua en el interior de las grietas, posición de los sensores y un amplio rango de posibles profundidades de fisura. Los métodos empleados se basan en la realización de varias medidas de velocidad o de tiempo de transmisión del pulso ultrasónico sobre una única cara o superficie accesible del material, de manera que resulte posible obtener un coeficiente de transmisión de la señal (mediciones relativas o autocompensadas). Las mediciones se han realizado a bajas frecuencias de excitación mediante dos equipos de ultrasonidos diferentes dotados, en un caso, de transductores de contacto puntual seco (DPC) y siendo en el otro instrumento de contacto plano a través de un material especialmente seleccionado para el acoplamiento (CPC). Ello ha permitido superar algunos de los tradicionales inconvenientes que presenta el uso de los transductores convencionales y no precisar preparación previa de las superficies. La técnica de autocalibración empleada elimina los errores sistemáticos y la necesidad de una calibración local previa, demostrando el potencial de esta tecnología. Los resultados experimentales han sido comparados con modelos teóricos simplificados que simulan la propagación de las ondas ultrasónicas en estos materiales bituminosos fisurados, los cuales han sido deducidos previamente mediante un planteamiento analítico y han permitido la correcta interpretación de dichos datos empíricos. Posteriormente, estos modelos se han calibrado mediante los resultados de laboratorio, proporcionándose sus expresiones matemáticas generalizadas y gráficas para su uso rutinario en las aplicaciones prácticas. Mediante los ensayos con ultrasonidos efectuados en campañas llevadas a cabo in situ, acompañados de la extracción de testigos del firme, se han podido evaluar los modelos propuestos. El máximo error relativo promedio en la estimación de la profundidad de las fisuras al aplicar dichos modelos no ha superado el 13%, con un nivel de confianza del 95%, en el conjunto de todos los ensayos realizados. La comprobación in situ de los modelos ha permitido establecer los criterios y las necesarias recomendaciones para su utilización sobre firmes en servicio. La experiencia obtenida posibilita la integración de esta metodología entre las técnicas de auscultación para la gestión de su conservación. Abstract Surface-initiated cracking of asphalt pavements constitutes one of the most frequent and important types of distress that occur in flexible bituminous pavements, as clearly has been demonstrated in the technical and experimental studies done over the past decade. However, this failure mechanism has not been taken into consideration for traditional methods of flexible pavement design. The concept of long-lasting pavements is based on adequate monitoring of the depth and extent of these deteriorations and on intervention at the most appropriate moment so as to contain them in the surface layer in the form of easily-accessible and repairable partial-depth topdown cracks, thereby prolonging the durability and serviceability of the pavement and reducing the overall cost of its life cycle. Therefore, to select the optimal maintenance strategy for perpetual pavements, it becomes essential to have access to methodologies that enable precise on-site identification, monitoring and control of top-down propagated cracks and that also permit a reliable, high-performance determination of the extent and depth of cracking. This PhD Thesis presents the results of systematic laboratory and in situ research carried out to obtain information about top-down cracking in asphalt pavements and to study methods of depth evaluation of this type of cracking using ultrasonic techniques. These results have demonstrated that the proposed non-destructive methodology –cost-effective, fast and easy-to-implement– (mainly used to date for concrete and metal structures, due to the difficulties caused by the viscoelastic nature of bituminous materials) can be applied with sufficient reliability and repeatability to asphalt pavements. Measurements are also independent of the asphalt thickness. Furthermore, it resolves some of the common inconveniences presented by other methods used to evaluate pavement cracking, such as core extraction (a destructive and expensive procedure that requires prolonged traffic interruptions) and other non-destructive techniques, such as those based on deflection measurements or ground-penetrating radar, which are not sufficiently precise to measure surface cracks. To obtain these results, extensive tests were performed on laboratory specimens. Different empirical conditions were studied, such as various types of hot bituminous mixtures (AC, SMA and PA), differing thicknesses of asphalt and adhesions between layers, varied temperatures, surface textures, filling materials and water within the crack, different sensor positions, as well as an ample range of possible crack depths. The methods employed in the study are based on a series of measurements of ultrasonic pulse velocities or transmission times over a single accessible side or surface of the material that make it possible to obtain a signal transmission coefficient (relative or auto-calibrated readings). Measurements were taken at low frequencies by two short-pulse ultrasonic devices: one equipped with dry point contact transducers (DPC) and the other with flat contact transducers that require a specially-selected coupling material (CPC). In this way, some of the traditional inconveniences presented by the use of conventional transducers were overcome and a prior preparation of the surfaces was not required. The auto-compensating technique eliminated systematic errors and the need for previous local calibration, demonstrating the potential for this technology. The experimental results have been compared with simplified theoretical models that simulate ultrasonic wave propagation in cracked bituminous materials, which had been previously deduced using an analytical approach and have permitted the correct interpretation of the aforementioned empirical results. These models were subsequently calibrated using the laboratory results, providing generalized mathematical expressions and graphics for routine use in practical applications. Through a series of on-site ultrasound test campaigns, accompanied by asphalt core extraction, it was possible to evaluate the proposed models, with differences between predicted crack depths and those measured in situ lower than 13% (with a confidence level of 95%). Thereby, the criteria and the necessary recommendations for their implementation on in-service asphalt pavements have been established. The experience obtained through this study makes it possible to integrate this methodology into the evaluation techniques for pavement management systems.
Resumo:
This paper is on homonymous distributed systems where processes are prone to crash failures and have no initial knowledge of the system membership (?homonymous? means that several processes may have the same identi?er). New classes of failure detectors suited to these systems are ?rst de?ned. Among them, the classes H? and H? are introduced that are the homonymous counterparts of the classes ? and ?, respectively. (Recall that the pair h?,?i de?nes the weakest failure detector to solve consensus.) Then, the paper shows how H? and H? can be implemented in homonymous systems without membership knowledge (under different synchrony requirements). Finally, two algorithms are presented that use these failure detectors to solve consensus in homonymous asynchronous systems where there is no initial knowledge ofthe membership. One algorithm solves consensus with hH?, H?i, while the other uses only H?, but needs a majority of correct processes. Observe that the systems with unique identi?ers and anonymous systems are extreme cases of homonymous systems from which follows that all these results also apply to these systems. Interestingly, the new failure detector class H? can be implemented with partial synchrony, while the analogous class A? de?ned for anonymous systems can not be implemented (even in synchronous systems). Hence, the paper provides us with the ?rst proof showing that consensus can be solved in anonymous systems with only partial synchrony (and a majority of correct processes).
Resumo:
The set agreement problem states that from n proposed values at most n-1 can be decided. Traditionally, this problem is solved using a failure detector in asynchronous systems where processes may crash but not recover, where processes have different identities, and where all processes initially know the membership. In this paper we study the set agreement problem and the weakest failure detector L used to solve it in asynchronous message passing systems where processes may crash and recover, with homonyms (i.e., processes may have equal identities) and without a complete initial knowledge of the membership.
Resumo:
Experiences in decentralized rural electrification programmes using solar home systems have suffered difficulties during the operation and maintenance phase, due in many cases, to the underestimation of the maintenance cost, because of the decentralized character of the activity, and also because the reliability of the solar home system components is frequently unknown. This paper reports on the reliability study and cost characterization achieved in a large photovoltaic rural electrification programme carried out in Morocco. The paper aims to determinate the reliability features of the solar systems, focusing in the in-field testing for batteries and photovoltaic modules. The degradation rates for batteries and PV modules have been extracted from the in-field experiments. On the other hand, the main costs related to the operation and maintenance activity have been identified with the aim of establishing the main factors that lead to the failure of the quality sustainability in many rural electrification programmes.
Resumo:
Hybrid Stepper Motors are widely used in open-loop position applications. They are the choice of actuation for the collimators in the Large Hadron Collider, the largest particle accelerator at CERN. In this case the positioning requirements and the highly radioactive operating environment are unique. The latter forces both the use of long cables to connect the motors to the drives which act as transmission lines and also prevents the use of standard position sensors. However, reliable and precise operation of the collimators is critical for the machine, requiring the prevention of step loss in the motors and maintenance to be foreseen in case of mechanical degradation. In order to make the above possible, an approach is proposed for the application of an Extended Kalman Filter to a sensorless stepper motor drive, when the motor is separated from its drive by long cables. When the long cables and high frequency pulse width modulated control voltage signals are used together, the electrical signals difer greatly between the motor and drive-side of the cable. Since in the considered case only drive-side data is available, it is therefore necessary to estimate the motor-side signals. Modelling the entire cable and motor system in an Extended Kalman Filter is too computationally intensive for standard embedded real-time platforms. It is, in consequence, proposed to divide the problem into an Extended Kalman Filter, based only on the motor model, and separated motor-side signal estimators, the combination of which is less demanding computationally. The efectiveness of this approach is shown in simulation. Then its validity is experimentally demonstrated via implementation in a DSP based drive. A testbench to test its performance when driving an axis of a Large Hadron Collider collimator is presented along with the results achieved. It is shown that the proposed method is capable of achieving position and load torque estimates which allow step loss to be detected and mechanical degradation to be evaluated without the need for physical sensors. These estimation algorithms often require a precise model of the motor, but the standard electrical model used for hybrid stepper motors is limited when currents, which are high enough to produce saturation of the magnetic circuit, are present. New model extensions are proposed in order to have a more precise model of the motor independently of the current level, whilst maintaining a low computational cost. It is shown that a significant improvement in the model It is achieved with these extensions, and their computational performance is compared to study the cost of model improvement versus computation cost. The applicability of the proposed model extensions is demonstrated via their use in an Extended Kalman Filter running in real-time for closed-loop current control and mechanical state estimation. An additional problem arises from the use of stepper motors. The mechanics of the collimators can wear due to the abrupt motion and torque profiles that are applied by them when used in the standard way, i.e. stepping in open-loop. Closed-loop position control, more specifically Field Oriented Control, would allow smoother profiles, more respectful to the mechanics, to be applied but requires position feedback. As mentioned already, the use of sensors in radioactive environments is very limited for reliability reasons. Sensorless control is a known option but when the speed is very low or zero, as is the case most of the time for the motors used in the LHC collimator, the loss of observability prevents its use. In order to allow the use of position sensors without reducing the long term reliability of the whole system, the possibility to switch from closed to open loop is proposed and validated, allowing the use of closed-loop control when the position sensors function correctly and open-loop when there is a sensor failure. A different approach to deal with the switched drive working with long cables is also presented. Switched mode stepper motor drives tend to have poor performance or even fail completely when the motor is fed through a long cable due to the high oscillations in the drive-side current. The design of a stepper motor output fillter which solves this problem is thus proposed. A two stage filter, one devoted to dealing with the diferential mode and the other with the common mode, is designed and validated experimentally. With this ?lter the drive performance is greatly improved, achieving a positioning repeatability even better than with the drive working without a long cable, the radiated emissions are reduced and the overvoltages at the motor terminals are eliminated.
Resumo:
Con 1.300 millones de personas en el mundo sin acceso a la electricidad (la mayoría en entornos rurales de países empobrecidos), la energía solar fotovoltaica constituye una solución viable técnica y económicamente para electrificar las zonas más remotas del planeta donde las redes eléctricas convencionales no llegan. Casi todos los países en el mundo han desarrollado algún tipo de programa de electrificación fotovoltaica rural durante los últimos 40 años, principalmente los países más pobres, donde a través de diferentes modelos de financiación, se han instalado millones de sistemas solares domiciliarios (pequeños sistemas fotovoltaicos para uso doméstico). Durante este largo período, se han ido superando muchas barreras, como la mejora de la calidad de los sistemas fotovoltaicos, la reducción de costes, la optimización del diseño y del dimensionado de los sistemas, la disponibilidad financiera para implantar programas de electrificación rural, etc. Gracias a esto, la electrificación rural descentralizada ha experimentado recientemente un salto de escala caracterizada por la implantación de grandes programas con miles de sistemas solares domiciliarios e integrando largos períodos de mantenimiento. Muchos de estos grandes programas se están llevando a cabo con limitado éxito, ya que generalmente parten de supuestos e hipótesis poco contrastadas con la realidad, comprometiendo así un retorno económico que permita el desarrollo de esta actividad a largo plazo. En este escenario surge un nuevo reto: el de cómo garantizar la sostenibilidad de los grandes programas de electrificación rural fotovoltaica. Se argumenta que la principal causa de esta falta de rentabilidad es el imprevisto alto coste de la fase de operación y mantenimiento. Cuestiones clave tales como la estructura de costes de operación y mantenimiento o la fiabilidad de los componentes del sistema fotovoltaico no están bien caracterizados hoy en día. Esta situación limita la capacidad de diseñar estructuras de mantenimiento capaces de asegurar la sostenibilidad y la rentabilidad del servicio de operación y mantenimiento en estos programas. Esta tesis doctoral tiene como objetivo responder a estas cuestiones. Se ha realizado varios estudios sobre la base de un gran programa de electrificación rural fotovoltaica real llevado a cabo en Marruecos con más de 13.000 sistemas solares domiciliarios instalados. Sobre la base de este programa se ha hecho una evaluación en profundidad de la fiabilidad de los sistemas solares a partir de los datos de mantenimiento recogidos durante 5 años con más de 80.000 inputs. Los resultados han permitido establecer las funciones de fiabilidad de los equipos tal y como se comportan en condiciones reales de operación, las tasas de fallos y los tiempos medios hasta el fallo para los principales componentes del sistema, siendo este el primer caso de divulgación de resultados de este tipo en el campo de la electrificación rural fotovoltaica. Los dos principales componentes del sistema solar domiciliario, la batería y el módulo fotovoltaico, han sido analizados en campo a través de una muestra de 41 sistemas trabajando en condiciones reales pertenecientes al programa solar marroquí. Por un lado se ha estudiado la degradación de la capacidad de las baterías y por otro la degradación de potencia de los módulos fotovoltaicos. En el caso de las baterías, los resultados nos han permitido caracterizar la curva de degradación en capacidad llegando a obtener una propuesta de nueva definición del umbral de vida útil de las baterías en electrificación rural. También sobre la base del programa solar de Marruecos se ha llevado a cabo un estudio de caracterización de los costes reales de operación y mantenimiento a partir de la base de datos de contabilidad del programa registrados durante 5 años. Los resultados del estudio han permitido definir cuáles son costes que más incidencia tienen en el coste global. Se han obtenido los costes unitarios por sistema instalado y se han calculado los montantes de las cuotas de mantenimiento de los usuarios para garantizar la rentabilidad de la operación y mantenimiento. Finalmente, se propone un modelo de optimización matemática para diseñar estructuras de mantenimiento basado en los resultados de los estudios anteriores. La herramienta, elaborada mediante programación lineal entera mixta, se ha aplicado al programa marroquí con el fin de validar el modelo propuesto. ABSTRACT With 1,300 million people worldwide deprived of access to electricity (mostly in rural environments), photovoltaic solar energy has proven to be a cost‐effective solution and the only hope for electrifying the most remote inhabitants of the planet, where conventional electric grids do not reach because they are unaffordable. Almost all countries in the world have had some kind of rural photovoltaic electrification programme during the past 40 years, mainly the poorer countries, where through different organizational models, millions of solar home systems (small photovoltaic systems for domestic use) have been installed. During this long period, many barriers have been overcome, such as quality enhancement, cost reduction, the optimization of designing and sizing, financial availability, etc. Thanks to this, decentralized rural electrification has recently experienced a change of scale characterized by new programmes with thousands of solar home systems and long maintenance periods. Many of these large programmes are being developed with limited success, as they have generally been based on assumptions that do not correspond to reality, compromising the economic return that allows long term activity. In this scenario a new challenge emerges, which approaches the sustainability of large programmes. It is argued that the main cause of unprofitability is the unexpected high cost of the operation and maintenance of the solar systems. In fact, the lack of a paradigm in decentralized rural services has led to many private companies to carry out decentralized electrification programmes blindly. Issues such as the operation and maintenance cost structure or the reliability of the solar home system components have still not been characterized. This situation does not allow optimized maintenance structure to be designed to assure the sustainability and profitability of the operation and maintenance service. This PhD thesis aims to respond to these needs. Several studies have been carried out based on a real and large photovoltaic rural electrification programme carried out in Morocco with more than 13,000 solar home systems. An in‐depth reliability assessment has been made from a 5‐year maintenance database with more than 80,000 maintenance inputs. The results have allowed us to establish the real reliability functions, the failure rate and the main time to failure of the main components of the system, reporting these findings for the first time in the field of rural electrification. Both in‐field experiments on the capacity degradation of batteries and power degradation of photovoltaic modules have been carried out. During the experiments both samples of batteries and modules were operating under real conditions integrated into the solar home systems of the Moroccan programme. In the case of the batteries, the results have enabled us to obtain a proposal of definition of death of batteries in rural electrification. A cost assessment of the Moroccan experience based on a 5‐year accounting database has been carried out to characterize the cost structure of the programme. The results have allowed the major costs of the photovoltaic electrification to be defined. The overall cost ratio per installed system has been calculated together with the necessary fees that users would have to pay to make the operation and maintenance affordable. Finally, a mathematical optimization model has been proposed to design maintenance structures based on the previous study results. The tool has been applied to the Moroccan programme with the aim of validating the model.
Resumo:
Las Field-Programmable Gate Arrays (FPGAs) SRAM se construyen sobre una memoria de configuración de tecnología RAM Estática (SRAM). Presentan múltiples características que las hacen muy interesantes para diseñar sistemas empotrados complejos. En primer lugar presentan un coste no-recurrente de ingeniería (NRE) bajo, ya que los elementos lógicos y de enrutado están pre-implementados (el diseño de usuario define su conexionado). También, a diferencia de otras tecnologías de FPGA, pueden ser reconfiguradas (incluso en campo) un número ilimitado de veces. Es más, las FPGAs SRAM de Xilinx soportan Reconfiguración Parcial Dinámica (DPR), la cual permite reconfigurar la FPGA sin interrumpir la aplicación. Finalmente, presentan una alta densidad de lógica, una alta capacidad de procesamiento y un rico juego de macro-bloques. Sin embargo, un inconveniente de esta tecnología es su susceptibilidad a la radiación ionizante, la cual aumenta con el grado de integración (geometrías más pequeñas, menores tensiones y mayores frecuencias). Esta es una precupación de primer nivel para aplicaciones en entornos altamente radiativos y con requisitos de alta confiabilidad. Este fenómeno conlleva una degradación a largo plazo y también puede inducir fallos instantáneos, los cuales pueden ser reversibles o producir daños irreversibles. En las FPGAs SRAM, los fallos inducidos por radiación pueden aparecer en en dos capas de arquitectura diferentes, que están físicamente superpuestas en el dado de silicio. La Capa de Aplicación (o A-Layer) contiene el hardware definido por el usuario, y la Capa de Configuración contiene la memoria de configuración y la circuitería de soporte. Los fallos en cualquiera de estas capas pueden hacer fracasar el sistema, lo cual puede ser ás o menos tolerable dependiendo de los requisitos de confiabilidad del sistema. En el caso general, estos fallos deben gestionados de alguna manera. Esta tesis trata sobre la gestión de fallos en FPGAs SRAM a nivel de sistema, en el contexto de sistemas empotrados autónomos y confiables operando en un entorno radiativo. La tesis se centra principalmente en aplicaciones espaciales, pero los mismos principios pueden aplicarse a aplicaciones terrenas. Las principales diferencias entre ambas son el nivel de radiación y la posibilidad de mantenimiento. Las diferentes técnicas para la gestión de fallos en A-Layer y C-Layer son clasificados, y sus implicaciones en la confiabilidad del sistema son analizados. Se proponen varias arquitecturas tanto para Gestores de Fallos de una capa como de doble-capa. Para estos últimos se propone una arquitectura novedosa, flexible y versátil. Gestiona las dos capas concurrentemente de manera coordinada, y permite equilibrar el nivel de redundancia y la confiabilidad. Con el objeto de validar técnicas de gestión de fallos dinámicas, se desarrollan dos diferentes soluciones. La primera es un entorno de simulación para Gestores de Fallos de C-Layer, basado en SystemC como lenguaje de modelado y como simulador basado en eventos. Este entorno y su metodología asociada permite explorar el espacio de diseño del Gestor de Fallos, desacoplando su diseño del desarrollo de la FPGA objetivo. El entorno incluye modelos tanto para la C-Layer de la FPGA como para el Gestor de Fallos, los cuales pueden interactuar a diferentes niveles de abstracción (a nivel de configuration frames y a nivel físico JTAG o SelectMAP). El entorno es configurable, escalable y versátil, e incluye capacidades de inyección de fallos. Los resultados de simulación para algunos escenarios son presentados y comentados. La segunda es una plataforma de validación para Gestores de Fallos de FPGAs Xilinx Virtex. La plataforma hardware aloja tres Módulos de FPGA Xilinx Virtex-4 FX12 y dos Módulos de Unidad de Microcontrolador (MCUs) de 32-bits de propósito general. Los Módulos MCU permiten prototipar Gestores de Fallos de C-Layer y A-Layer basados en software. Cada Módulo FPGA implementa un enlace de A-Layer Ethernet (a través de un switch Ethernet) con uno de los Módulos MCU, y un enlace de C-Layer JTAG con el otro. Además, ambos Módulos MCU intercambian comandos y datos a través de un enlace interno tipo UART. Al igual que para el entorno de simulación, se incluyen capacidades de inyección de fallos. Los resultados de pruebas para algunos escenarios son también presentados y comentados. En resumen, esta tesis cubre el proceso completo desde la descripción de los fallos FPGAs SRAM inducidos por radiación, pasando por la identificación y clasificación de técnicas de gestión de fallos, y por la propuesta de arquitecturas de Gestores de Fallos, para finalmente validarlas por simulación y pruebas. El trabajo futuro está relacionado sobre todo con la implementación de Gestores de Fallos de Sistema endurecidos para radiación. ABSTRACT SRAM-based Field-Programmable Gate Arrays (FPGAs) are built on Static RAM (SRAM) technology configuration memory. They present a number of features that make them very convenient for building complex embedded systems. First of all, they benefit from low Non-Recurrent Engineering (NRE) costs, as the logic and routing elements are pre-implemented (user design defines their connection). Also, as opposed to other FPGA technologies, they can be reconfigured (even in the field) an unlimited number of times. Moreover, Xilinx SRAM-based FPGAs feature Dynamic Partial Reconfiguration (DPR), which allows to partially reconfigure the FPGA without disrupting de application. Finally, they feature a high logic density, high processing capability and a rich set of hard macros. However, one limitation of this technology is its susceptibility to ionizing radiation, which increases with technology scaling (smaller geometries, lower voltages and higher frequencies). This is a first order concern for applications in harsh radiation environments and requiring high dependability. Ionizing radiation leads to long term degradation as well as instantaneous faults, which can in turn be reversible or produce irreversible damage. In SRAM-based FPGAs, radiation-induced faults can appear at two architectural layers, which are physically overlaid on the silicon die. The Application Layer (or A-Layer) contains the user-defined hardware, and the Configuration Layer (or C-Layer) contains the (volatile) configuration memory and its support circuitry. Faults at either layers can imply a system failure, which may be more ore less tolerated depending on the dependability requirements. In the general case, such faults must be managed in some way. This thesis is about managing SRAM-based FPGA faults at system level, in the context of autonomous and dependable embedded systems operating in a radiative environment. The focus is mainly on space applications, but the same principles can be applied to ground applications. The main differences between them are the radiation level and the possibility for maintenance. The different techniques for A-Layer and C-Layer fault management are classified and their implications in system dependability are assessed. Several architectures are proposed, both for single-layer and dual-layer Fault Managers. For the latter, a novel, flexible and versatile architecture is proposed. It manages both layers concurrently in a coordinated way, and allows balancing redundancy level and dependability. For the purpose of validating dynamic fault management techniques, two different solutions are developed. The first one is a simulation framework for C-Layer Fault Managers, based on SystemC as modeling language and event-driven simulator. This framework and its associated methodology allows exploring the Fault Manager design space, decoupling its design from the target FPGA development. The framework includes models for both the FPGA C-Layer and for the Fault Manager, which can interact at different abstraction levels (at configuration frame level and at JTAG or SelectMAP physical level). The framework is configurable, scalable and versatile, and includes fault injection capabilities. Simulation results for some scenarios are presented and discussed. The second one is a validation platform for Xilinx Virtex FPGA Fault Managers. The platform hosts three Xilinx Virtex-4 FX12 FPGA Modules and two general-purpose 32-bit Microcontroller Unit (MCU) Modules. The MCU Modules allow prototyping software-based CLayer and A-Layer Fault Managers. Each FPGA Module implements one A-Layer Ethernet link (through an Ethernet switch) with one of the MCU Modules, and one C-Layer JTAG link with the other. In addition, both MCU Modules exchange commands and data over an internal UART link. Similarly to the simulation framework, fault injection capabilities are implemented. Test results for some scenarios are also presented and discussed. In summary, this thesis covers the whole process from describing the problem of radiationinduced faults in SRAM-based FPGAs, then identifying and classifying fault management techniques, then proposing Fault Manager architectures and finally validating them by simulation and test. The proposed future work is mainly related to the implementation of radiation-hardened System Fault Managers.