13 resultados para Software Fault Isolation
em Universidad Politécnica de Madrid
Resumo:
In this paper a new method for fault isolation in a class of continuous-time stochastic dynamical systems is proposed. The method is framed in the context of model-based analytical redundancy, consisting in the generation of a residual signal by means of a diagnostic observer, for its posterior analysis. Once a fault has been detected, and assuming some basic a priori knowledge about the set of possible failures in the plant, the isolation task is then formulated as a type of on-line statistical classification problem. The proposed isolation scheme employs in parallel different hypotheses tests on a statistic of the residual signal, one test for each possible fault. This isolation method is characterized by deriving for the unidimensional case, a sufficient isolability condition as well as an upperbound of the probability of missed isolation. Simulation examples illustrate the applicability of the proposed scheme.
Resumo:
In this paper fault detection and isolation (FDI) schemes are applied in the context of the surveillance of emerging faults in an electrical circuit. The FDI problem is studied on a noisy nonlinear circuit, where both abrupt and incipient faults in the voltage source are considered. A rigorous analysis of fault detectability precedes the application of the fault detection (FD) scheme; then, the fault isolation (FI) phase is accomplished with two alternative FI approaches, proposed as new extensions of that FD approach. Numerical simulations illustrate the applicability of the mentioned schemes.
Resumo:
The continuous increment of processors computational power and the requirements on additional functionality and services are motivating a change in the way embedded systems are built. Components with different criticality level are allocated in the same processor, which give rise to mixed-criticality systems. The use of partitioned systems is a way of preventing undesirable interferences between components with different criticality level. An hypervisor provides these partitions or virtual machines, ensuring spatial, temporal and fault isolation between them. The purpose of this paper is to illustrate the development of a mixed-critical system. The attitude control subsystem is used for showing the different steps, which are supported by a toolset developed in the context of the MultiPARTES research project.
Resumo:
En la actualidad gran parte de las industrias utilizan o desarrollan plataformas, las cuales integran un número cada vez más elevado de sistemas complejos. El mantenimiento centralizado permite optimizar el mantenimiento de estas plataformas, por medio de la integración de un sistema encargado de gestionar el mantenimiento de todos los sistemas de la plataforma. Este Trabajo Fin de Máster (TFM) desarrolla el concepto de mantenimiento centralizado para sistemas complejos, aplicable a plataformas formadas por sistemas modulares. Está basado en la creciente demanda de las diferentes industrias en las que se utilizan este tipo de plataformas, como por ejemplo la industria aeronáutica, del ferrocarril y del automóvil. Para ello este TFM analiza el Estado del Arte de los sistemas de mantenimiento centralizados en diferentes industrias, además desarrolla los diferentes tipos de arquitecturas de sistemas, las técnicas de mantenimiento aplicables, así como los sistemas y técnicas de mantenimiento basados en funciones de monitorización y auto diagnóstico denominadas Built-In-Test Equipment (BITE). Adicionalmente, este TFM incluye el desarrollo e implementación de un modelo de un Entorno de Mantenimiento Centralizado en LabVIEW. Este entorno está formado por el modelo de un Sistema Patrón, así como el modelo del Sistema de Mantenimiento Centralizado y la interfaces entre ellos. El modelo del Sistema de Mantenimiento Centralizado integra diferentes funciones para el diagnóstico y aislamiento de los fallos. Así mismo, incluye una función para el análisis estadístico de los datos de fallos almacenados por el propio sistema, con el objetivo de proporcionar capacidades de mantenimiento predictivo a los sistemas del entorno. Para la implementación del modelo del Entorno de Mantenimiento Centralizado se han utilizado recursos de comunicaciones vía TCP/IP, modelización y almacenamiento de datos en ficheros XML y generación automática de informes en HTML. ABSTRACT. Currently several industries are developing or are making use of multi system platforms. These platforms are composed by many complex systems. The centralized maintenance allows the maintenance optimization, integrating a maintenance management system. This system is in charge of managing the maintenance dialog with the different and multiple platforms. This Master Final Project (TFM) develops the centralized maintenance concept for platforms integrated by modular and complex systems. This TFM is based on the demand of the industry that uses or develops multi system platforms, as aeronautic, railway, and automotive industries. In this way, this TFM covers and analyzes several aspects of the centralized maintenance systems like the State of the Art, for several industries. Besides this work develops different systems architecture types, maintenance techniques, and techniques and systems based on Built-in-test Equipment functions. Additionally, this TFM includes a LabVIEW Centralized System Environment model. This model is composed by a Standard System, the Centralized Maintenance System and the corresponding interfaces. Several diagnostic and fault isolation functions are integrated on the Centralized Maintenance Systems, as well a statistic analysis function, that provides with predictive maintenance capacity, based on the failure data stored by the system. Among others, the following resources have been used for the Centralized System Environment model development: TCP/IP communications, XML file data modelization and storing, and also automatic HTML reports generation.
Resumo:
La Universidad Politécnica de Madrid está investigando en el campo de la robótica inteligente, concretamente con el empleo de vehículos aéreos no tripulados (UAV). El objetivo final que se persigue con las investigaciones en este campo es el desarrollo de sistemas capaces de operar de forma más autónoma en un amplio espectro de situaciones. Dentro de este marco, este trabajo fin de grado se centra en el desarrollo de un sistema de supervisión para UAVs que persigue facilitar la monitorización de la ejecución de los procesos y facilitar la inclusión de procedimientos para incrementar la tolerancia a los fallos software. A lo largo de esta memoria se ofrece una revisión del estado del arte en el ámbito de la robótica, haciendo especial hincapié en la robótica inteligente con los métodos de desarrollo existentes y la definición de los distintos marcos de clasificación de la autonomía. También se ofrece una vista a las distintas técnicas existentes para lograr una mayor tolerancia a los fallos software, de entre las que han sido seleccionadas varias de ellas en la realización de este trabajo. Finalmente se describe el sistema de supervisión desarrollado, explicando primero el sistema desde un punto de vista funcional para más adelante adentrarse en la solución técnica elaborada. ---ABSTRACT--- The Universidad Politécnica de Madrid is currently handling several investigations regarding AI robotics, some of them are actually directing their efforts into the use of unmanned aerial vehicles (UAV). The goal in the long term for this investigations is the accomplishment of systems capable of operating autonomously, regardless of the situation the robot is place at. From this perspective, this final degree project focuses on de design and development of a supervision system for UAV’s, which function is to ease the monitoring of executing processes and the inclusion of fault tolerant procedures. During the development of this document a state of the art revision is offered, in which a thorough description through development methods and autonomy definitions for AI robotics is made. It is also offered a look around the different existing techniques for achieving a greater software fault tolerance, from which some of them were chosen for the development of this project. Finally the developed supervision system is described, first from a pure functional perspective of what the system should do and latter with a description of the actual technical solutions developed for this system.
Resumo:
A good and early fault detection and isolation system along with efficient alarm management and fine sensor validation systems are very important in today¿s complex process plants, specially in terms of safety enhancement and costs reduction. This paper presents a methodology for fault characterization. This is a self-learning approach developed in two phases. An initial, learning phase, where the simulation of process units, without and with different faults, will let the system (in an automated way) to detect the key variables that characterize the faults. This will be used in a second (on line) phase, where these key variables will be monitored in order to diagnose possible faults. Using this scheme the faults will be diagnosed and isolated in an early stage where the fault still has not turned into a failure.
Resumo:
This paper presents a new fault detection and isolation scheme for dealing with simultaneous additive and parametric faults. The new design integrates a system for additive fault detection based on Castillo and Zufiria, 2009 and a new parametric fault detection and isolation scheme inspired in Munz and Zufiria, 2008 . It is shown that the so far existing schemes do not behave correctly when both additive and parametric faults occur simultaneously; to solve the problem a new integrated scheme is proposed. Computer simulation results are presented to confirm the theoretical studies.
Resumo:
We discuss experiences gained by porting a Software Validation Facility (SVF) and a satellite Central Software (CSW) to a platform with support for Time and Space Partitioning (TSP). The SVF and CSW are part of the EagleEye Reference mission of the European Space Agency (ESA). As a reference mission, EagleEye is a perfect candidate to evaluate practical aspects of developing satellite CSW for and on TSP platforms. The specific TSP platform we used consists of a simulate D LEON3 CPU controlled by the XtratuM separation micro-kernel. On top of this, we run five separate partitions. Each partition ru n s its own real-time operating system or Ada run-time kernel, which in turn are running the application software of the CSW. We describe issues related to partitioning; inter-partition communication; scheduling; I/O; and fault-detection, isolation, and recovery (FDIR)
Resumo:
An accepted fact in software engineering is that software must undergo verification and validation process during development to ascertain and improve its quality level. But there are too many techniques than a single developer could master, yet, it is impossible to be certain that software is free of defects. So, it is crucial for developers to be able to choose from available evaluation techniques, the one most suitable and likely to yield optimum quality results for different products. Though, some knowledge is available on the strengths and weaknesses of the available software quality assurance techniques but not much is known yet on the relationship between different techniques and contextual behavior of the techniques. Objective: This research investigates the effectiveness of two testing techniques ? equivalence class partitioning and decision coverage and one review technique ? code review by abstraction, in terms of their fault detection capability. This will be used to strengthen the practical knowledge available on these techniques.
Resumo:
The existing seismic isolation systems are based on well-known and accepted physical principles, but they are still having some functional drawbacks. As an attempt of improvement, the Roll-N-Cage (RNC) isolator has been recently proposed. It is designed to achieve a balance in controlling isolator displacement demands and structural accelerations. It provides in a single unit all the necessary functions of vertical rigid support, horizontal flexibility with enhanced stability, resistance to low service loads and minor vibration, and hysteretic energy dissipation characteristics. It is characterized by two unique features that are a self-braking (buffer) and a self-recentering mechanism. This paper presents an advanced representation of the main and unique features of the RNC isolator using an available finite element code called SAP2000. The validity of the obtained SAP2000 model is then checked using experimental, numerical and analytical results. Then, the paper investigates the merits and demerits of activating the built-in buffer mechanism on both structural pounding mitigation and isolation efficiency. The paper addresses the problem of passive alleviation of possible inner pounding within the RNC isolator, which may arise due to the activation of its self-braking mechanism under sever excitations such as near-fault earthquakes. The results show that the obtained finite element code-based model can closely match and accurately predict the overall behavior of the RNC isolator with effectively small errors. Moreover, the inherent buffer mechanism of the RNC isolator could mitigate or even eliminate direct structure-tostructure pounding under severe excitation considering limited septation gaps between adjacent structures. In addition, the increase of inherent hysteretic damping of the RNC isolator can efficiently limit its peak displacement together with the severity of the possibly developed inner pounding and, therefore, alleviate or even eliminate the possibly arising negative effects of the buffer mechanism on the overall RNC-isolated structural responses.
Resumo:
With the ever growing trend of smart phones and tablets, Android is becoming more and more popular everyday. With more than one billion active users i to date, Android is the leading technology in smart phone arena. In addition to that, Android also runs on Android TV, Android smart watches and cars. Therefore, in recent years, Android applications have become one of the major development sectors in software industry. As of mid 2013, the number of published applications on Google Play had exceeded one million and the cumulative number of downloads was more than 50 billionii. A 2013 survey also revealed that 71% of the mobile application developers work on developing Android applicationsiii. Considering this size of Android applications, it is quite evident that people rely on these applications on a daily basis for the completion of simple tasks like keeping track of weather to rather complex tasks like managing one’s bank accounts. Hence, like every other kind of code, Android code also needs to be verified in order to work properly and achieve a certain confidence level. Because of the gigantic size of the number of applications, it becomes really hard to manually test Android applications specially when it has to be verified for various versions of the OS and also, various device configurations such as different screen sizes and different hardware availability. Hence, recently there has been a lot of work on developing different testing methods for Android applications in Computer Science fraternity. The model of Android attracts researchers because of its open source nature. It makes the whole research model more streamlined when the code for both, application and the platform are readily available to analyze. And hence, there has been a great deal of research in testing and static analysis of Android applications. A great deal of this research has been focused on the input test generation for Android applications. Hence, there are a several testing tools available now, which focus on automatic generation of test cases for Android applications. These tools differ with one another on the basis of their strategies and heuristics used for this generation of test cases. But there is still very little work done on the comparison of these testing tools and the strategies they use. Recently, some research work has been carried outiv in this regard that compared the performance of various available tools with respect to their respective code coverage, fault detection, ability to work on multiple platforms and their ease of use. It was done, by running these tools on a total of 60 real world Android applications. The results of this research showed that although effective, these strategies being used by the tools, also face limitations and hence, have room for improvement. The purpose of this thesis is to extend this research into a more specific and attribute-‐ oriented way. Attributes refer to the tasks that can be completed using the Android platform. It can be anything ranging from a basic system call for receiving an SMS to more complex tasks like sending the user to another application from the current one. The idea is to develop a benchmark for Android testing tools, which is based on the performance related to these attributes. This will allow the comparison of these tools with respect to these attributes. For example, if there is an application that plays some audio file, will the testing tool be able to generate a test input that will warrant the execution of this audio file? Using multiple applications using different attributes, it can be visualized that which testing tool is more useful for which kinds of attributes. In this thesis, it was decided that 9 attributes covering the basic nature of tasks, will be targeted for the assessment of three testing tools. Later this can be done for much more attributes to compare even more testing tools. The aim of this work is to show that this approach is effective and can be used on a much larger scale. One of the flagship features of this work, which also differentiates it with the previous work, is that the applications used, are all specially made for this research. The reason for doing that is to analyze just that specific attribute in isolation, which the application is focused on, and not allow the tool to get bottlenecked by something trivial, which is not the main attribute under testing. This means 9 applications, each focused on one specific attribute. The main contributions of this thesis are: A summary of the three existing testing tools and their respective techniques for automatic test input generation of Android Applications. • A detailed study of the usage of these testing tools using the 9 applications specially designed and developed for this study. • The analysis of the obtained results of the study carried out. And a comparison of the performance of the selected tools.
Resumo:
Las Field-Programmable Gate Arrays (FPGAs) SRAM se construyen sobre una memoria de configuración de tecnología RAM Estática (SRAM). Presentan múltiples características que las hacen muy interesantes para diseñar sistemas empotrados complejos. En primer lugar presentan un coste no-recurrente de ingeniería (NRE) bajo, ya que los elementos lógicos y de enrutado están pre-implementados (el diseño de usuario define su conexionado). También, a diferencia de otras tecnologías de FPGA, pueden ser reconfiguradas (incluso en campo) un número ilimitado de veces. Es más, las FPGAs SRAM de Xilinx soportan Reconfiguración Parcial Dinámica (DPR), la cual permite reconfigurar la FPGA sin interrumpir la aplicación. Finalmente, presentan una alta densidad de lógica, una alta capacidad de procesamiento y un rico juego de macro-bloques. Sin embargo, un inconveniente de esta tecnología es su susceptibilidad a la radiación ionizante, la cual aumenta con el grado de integración (geometrías más pequeñas, menores tensiones y mayores frecuencias). Esta es una precupación de primer nivel para aplicaciones en entornos altamente radiativos y con requisitos de alta confiabilidad. Este fenómeno conlleva una degradación a largo plazo y también puede inducir fallos instantáneos, los cuales pueden ser reversibles o producir daños irreversibles. En las FPGAs SRAM, los fallos inducidos por radiación pueden aparecer en en dos capas de arquitectura diferentes, que están físicamente superpuestas en el dado de silicio. La Capa de Aplicación (o A-Layer) contiene el hardware definido por el usuario, y la Capa de Configuración contiene la memoria de configuración y la circuitería de soporte. Los fallos en cualquiera de estas capas pueden hacer fracasar el sistema, lo cual puede ser ás o menos tolerable dependiendo de los requisitos de confiabilidad del sistema. En el caso general, estos fallos deben gestionados de alguna manera. Esta tesis trata sobre la gestión de fallos en FPGAs SRAM a nivel de sistema, en el contexto de sistemas empotrados autónomos y confiables operando en un entorno radiativo. La tesis se centra principalmente en aplicaciones espaciales, pero los mismos principios pueden aplicarse a aplicaciones terrenas. Las principales diferencias entre ambas son el nivel de radiación y la posibilidad de mantenimiento. Las diferentes técnicas para la gestión de fallos en A-Layer y C-Layer son clasificados, y sus implicaciones en la confiabilidad del sistema son analizados. Se proponen varias arquitecturas tanto para Gestores de Fallos de una capa como de doble-capa. Para estos últimos se propone una arquitectura novedosa, flexible y versátil. Gestiona las dos capas concurrentemente de manera coordinada, y permite equilibrar el nivel de redundancia y la confiabilidad. Con el objeto de validar técnicas de gestión de fallos dinámicas, se desarrollan dos diferentes soluciones. La primera es un entorno de simulación para Gestores de Fallos de C-Layer, basado en SystemC como lenguaje de modelado y como simulador basado en eventos. Este entorno y su metodología asociada permite explorar el espacio de diseño del Gestor de Fallos, desacoplando su diseño del desarrollo de la FPGA objetivo. El entorno incluye modelos tanto para la C-Layer de la FPGA como para el Gestor de Fallos, los cuales pueden interactuar a diferentes niveles de abstracción (a nivel de configuration frames y a nivel físico JTAG o SelectMAP). El entorno es configurable, escalable y versátil, e incluye capacidades de inyección de fallos. Los resultados de simulación para algunos escenarios son presentados y comentados. La segunda es una plataforma de validación para Gestores de Fallos de FPGAs Xilinx Virtex. La plataforma hardware aloja tres Módulos de FPGA Xilinx Virtex-4 FX12 y dos Módulos de Unidad de Microcontrolador (MCUs) de 32-bits de propósito general. Los Módulos MCU permiten prototipar Gestores de Fallos de C-Layer y A-Layer basados en software. Cada Módulo FPGA implementa un enlace de A-Layer Ethernet (a través de un switch Ethernet) con uno de los Módulos MCU, y un enlace de C-Layer JTAG con el otro. Además, ambos Módulos MCU intercambian comandos y datos a través de un enlace interno tipo UART. Al igual que para el entorno de simulación, se incluyen capacidades de inyección de fallos. Los resultados de pruebas para algunos escenarios son también presentados y comentados. En resumen, esta tesis cubre el proceso completo desde la descripción de los fallos FPGAs SRAM inducidos por radiación, pasando por la identificación y clasificación de técnicas de gestión de fallos, y por la propuesta de arquitecturas de Gestores de Fallos, para finalmente validarlas por simulación y pruebas. El trabajo futuro está relacionado sobre todo con la implementación de Gestores de Fallos de Sistema endurecidos para radiación. ABSTRACT SRAM-based Field-Programmable Gate Arrays (FPGAs) are built on Static RAM (SRAM) technology configuration memory. They present a number of features that make them very convenient for building complex embedded systems. First of all, they benefit from low Non-Recurrent Engineering (NRE) costs, as the logic and routing elements are pre-implemented (user design defines their connection). Also, as opposed to other FPGA technologies, they can be reconfigured (even in the field) an unlimited number of times. Moreover, Xilinx SRAM-based FPGAs feature Dynamic Partial Reconfiguration (DPR), which allows to partially reconfigure the FPGA without disrupting de application. Finally, they feature a high logic density, high processing capability and a rich set of hard macros. However, one limitation of this technology is its susceptibility to ionizing radiation, which increases with technology scaling (smaller geometries, lower voltages and higher frequencies). This is a first order concern for applications in harsh radiation environments and requiring high dependability. Ionizing radiation leads to long term degradation as well as instantaneous faults, which can in turn be reversible or produce irreversible damage. In SRAM-based FPGAs, radiation-induced faults can appear at two architectural layers, which are physically overlaid on the silicon die. The Application Layer (or A-Layer) contains the user-defined hardware, and the Configuration Layer (or C-Layer) contains the (volatile) configuration memory and its support circuitry. Faults at either layers can imply a system failure, which may be more ore less tolerated depending on the dependability requirements. In the general case, such faults must be managed in some way. This thesis is about managing SRAM-based FPGA faults at system level, in the context of autonomous and dependable embedded systems operating in a radiative environment. The focus is mainly on space applications, but the same principles can be applied to ground applications. The main differences between them are the radiation level and the possibility for maintenance. The different techniques for A-Layer and C-Layer fault management are classified and their implications in system dependability are assessed. Several architectures are proposed, both for single-layer and dual-layer Fault Managers. For the latter, a novel, flexible and versatile architecture is proposed. It manages both layers concurrently in a coordinated way, and allows balancing redundancy level and dependability. For the purpose of validating dynamic fault management techniques, two different solutions are developed. The first one is a simulation framework for C-Layer Fault Managers, based on SystemC as modeling language and event-driven simulator. This framework and its associated methodology allows exploring the Fault Manager design space, decoupling its design from the target FPGA development. The framework includes models for both the FPGA C-Layer and for the Fault Manager, which can interact at different abstraction levels (at configuration frame level and at JTAG or SelectMAP physical level). The framework is configurable, scalable and versatile, and includes fault injection capabilities. Simulation results for some scenarios are presented and discussed. The second one is a validation platform for Xilinx Virtex FPGA Fault Managers. The platform hosts three Xilinx Virtex-4 FX12 FPGA Modules and two general-purpose 32-bit Microcontroller Unit (MCU) Modules. The MCU Modules allow prototyping software-based CLayer and A-Layer Fault Managers. Each FPGA Module implements one A-Layer Ethernet link (through an Ethernet switch) with one of the MCU Modules, and one C-Layer JTAG link with the other. In addition, both MCU Modules exchange commands and data over an internal UART link. Similarly to the simulation framework, fault injection capabilities are implemented. Test results for some scenarios are also presented and discussed. In summary, this thesis covers the whole process from describing the problem of radiationinduced faults in SRAM-based FPGAs, then identifying and classifying fault management techniques, then proposing Fault Manager architectures and finally validating them by simulation and test. The proposed future work is mainly related to the implementation of radiation-hardened System Fault Managers.
Resumo:
La presente tesis doctoral contribuye al problema del diagnóstico autonómico de fallos en redes de telecomunicación. En las redes de telecomunicación actuales, las operadoras realizan tareas de diagnóstico de forma manual. Dichas operaciones deben ser llevadas a cabo por ingenieros altamente cualificados que cada vez tienen más dificultades a la hora de gestionar debidamente el crecimiento exponencial de la red tanto en tamaño, complejidad y heterogeneidad. Además, el advenimiento del Internet del Futuro hace que la demanda de sistemas que simplifiquen y automaticen la gestión de las redes de telecomunicación se haya incrementado en los últimos años. Para extraer el conocimiento necesario para desarrollar las soluciones propuestas y facilitar su adopción por los operadores de red, se propone una metodología de pruebas de aceptación para sistemas multi-agente enfocada en simplificar la comunicación entre los diferentes grupos de trabajo involucrados en todo proyecto de desarrollo software: clientes y desarrolladores. Para contribuir a la solución del problema del diagnóstico autonómico de fallos, se propone una arquitectura de agente capaz de diagnosticar fallos en redes de telecomunicación de manera autónoma. Dicha arquitectura extiende el modelo de agente Belief-Desire- Intention (BDI) con diferentes modelos de diagnóstico que gestionan las diferentes sub-tareas del proceso. La arquitectura propuesta combina diferentes técnicas de razonamiento para alcanzar su propósito gracias a un modelo estructural de la red, que usa razonamiento basado en ontologías, y un modelo causal de fallos, que usa razonamiento Bayesiano para gestionar debidamente la incertidumbre del proceso de diagnóstico. Para asegurar la adecuación de la arquitectura propuesta en situaciones de gran complejidad y heterogeneidad, se propone un marco de argumentación que permite diagnosticar a agentes que estén ejecutando en dominios federados. Para la aplicación de este marco en un sistema multi-agente, se propone un protocolo de coordinación en el que los agentes dialogan hasta alcanzar una conclusión para un caso de diagnóstico concreto. Como trabajos futuros, se consideran la extensión de la arquitectura para abordar otros problemas de gestión como el auto-descubrimiento o la auto-optimización, el uso de técnicas de reputación dentro del marco de argumentación para mejorar la extensibilidad del sistema de diagnóstico en entornos federados y la aplicación de las arquitecturas propuestas en las arquitecturas de red emergentes, como SDN, que ofrecen mayor capacidad de interacción con la red. ABSTRACT This PhD thesis contributes to the problem of autonomic fault diagnosis of telecommunication networks. Nowadays, in telecommunication networks, operators perform manual diagnosis tasks. Those operations must be carried out by high skilled network engineers which have increasing difficulties to properly manage the growing of those networks, both in size, complexity and heterogeneity. Moreover, the advent of the Future Internet makes the demand of solutions which simplifies and automates the telecommunication network management has been increased in recent years. To collect the domain knowledge required to developed the proposed solutions and to simplify its adoption by the operators, an agile testing methodology is defined for multiagent systems. This methodology is focused on the communication gap between the different work groups involved in any software development project, stakeholders and developers. To contribute to overcoming the problem of autonomic fault diagnosis, an agent architecture for fault diagnosis of telecommunication networks is defined. That architecture extends the Belief-Desire-Intention (BDI) agent model with different diagnostic models which handle the different subtasks of the process. The proposed architecture combines different reasoning techniques to achieve its objective using a structural model of the network, which uses ontology-based reasoning, and a causal model, which uses Bayesian reasoning to properly handle the uncertainty of the diagnosis process. To ensure the suitability of the proposed architecture in complex and heterogeneous environments, an argumentation framework is defined. This framework allows agents to perform fault diagnosis in federated domains. To apply this framework in a multi-agent system, a coordination protocol is defined. This protocol is used by agents to dialogue until a reliable conclusion for a specific diagnosis case is reached. Future work comprises the further extension of the agent architecture to approach other managements problems, such as self-discovery or self-optimisation; the application of reputation techniques in the argumentation framework to improve the extensibility of the diagnostic system in federated domains; and the application of the proposed agent architecture in emergent networking architectures, such as SDN, which offers new capabilities of control for the network.