55 resultados para HPC parallel computer architecture queues fault tolerance programmability ADAM
Resumo:
La presente tesis doctoral contribuye al problema del diagnóstico autonómico de fallos en redes de telecomunicación. En las redes de telecomunicación actuales, las operadoras realizan tareas de diagnóstico de forma manual. Dichas operaciones deben ser llevadas a cabo por ingenieros altamente cualificados que cada vez tienen más dificultades a la hora de gestionar debidamente el crecimiento exponencial de la red tanto en tamaño, complejidad y heterogeneidad. Además, el advenimiento del Internet del Futuro hace que la demanda de sistemas que simplifiquen y automaticen la gestión de las redes de telecomunicación se haya incrementado en los últimos años. Para extraer el conocimiento necesario para desarrollar las soluciones propuestas y facilitar su adopción por los operadores de red, se propone una metodología de pruebas de aceptación para sistemas multi-agente enfocada en simplificar la comunicación entre los diferentes grupos de trabajo involucrados en todo proyecto de desarrollo software: clientes y desarrolladores. Para contribuir a la solución del problema del diagnóstico autonómico de fallos, se propone una arquitectura de agente capaz de diagnosticar fallos en redes de telecomunicación de manera autónoma. Dicha arquitectura extiende el modelo de agente Belief-Desire- Intention (BDI) con diferentes modelos de diagnóstico que gestionan las diferentes sub-tareas del proceso. La arquitectura propuesta combina diferentes técnicas de razonamiento para alcanzar su propósito gracias a un modelo estructural de la red, que usa razonamiento basado en ontologías, y un modelo causal de fallos, que usa razonamiento Bayesiano para gestionar debidamente la incertidumbre del proceso de diagnóstico. Para asegurar la adecuación de la arquitectura propuesta en situaciones de gran complejidad y heterogeneidad, se propone un marco de argumentación que permite diagnosticar a agentes que estén ejecutando en dominios federados. Para la aplicación de este marco en un sistema multi-agente, se propone un protocolo de coordinación en el que los agentes dialogan hasta alcanzar una conclusión para un caso de diagnóstico concreto. Como trabajos futuros, se consideran la extensión de la arquitectura para abordar otros problemas de gestión como el auto-descubrimiento o la auto-optimización, el uso de técnicas de reputación dentro del marco de argumentación para mejorar la extensibilidad del sistema de diagnóstico en entornos federados y la aplicación de las arquitecturas propuestas en las arquitecturas de red emergentes, como SDN, que ofrecen mayor capacidad de interacción con la red. ABSTRACT This PhD thesis contributes to the problem of autonomic fault diagnosis of telecommunication networks. Nowadays, in telecommunication networks, operators perform manual diagnosis tasks. Those operations must be carried out by high skilled network engineers which have increasing difficulties to properly manage the growing of those networks, both in size, complexity and heterogeneity. Moreover, the advent of the Future Internet makes the demand of solutions which simplifies and automates the telecommunication network management has been increased in recent years. To collect the domain knowledge required to developed the proposed solutions and to simplify its adoption by the operators, an agile testing methodology is defined for multiagent systems. This methodology is focused on the communication gap between the different work groups involved in any software development project, stakeholders and developers. To contribute to overcoming the problem of autonomic fault diagnosis, an agent architecture for fault diagnosis of telecommunication networks is defined. That architecture extends the Belief-Desire-Intention (BDI) agent model with different diagnostic models which handle the different subtasks of the process. The proposed architecture combines different reasoning techniques to achieve its objective using a structural model of the network, which uses ontology-based reasoning, and a causal model, which uses Bayesian reasoning to properly handle the uncertainty of the diagnosis process. To ensure the suitability of the proposed architecture in complex and heterogeneous environments, an argumentation framework is defined. This framework allows agents to perform fault diagnosis in federated domains. To apply this framework in a multi-agent system, a coordination protocol is defined. This protocol is used by agents to dialogue until a reliable conclusion for a specific diagnosis case is reached. Future work comprises the further extension of the agent architecture to approach other managements problems, such as self-discovery or self-optimisation; the application of reputation techniques in the argumentation framework to improve the extensibility of the diagnostic system in federated domains; and the application of the proposed agent architecture in emergent networking architectures, such as SDN, which offers new capabilities of control for the network.
Resumo:
One of the most demanding needs in cloud computing and big data is that of having scalable and highly available databases. One of the ways to attend these needs is to leverage the scalable replication techniques developed in the last decade. These techniques allow increasing both the availability and scalability of databases. Many replication protocols have been proposed during the last decade. The main research challenge was how to scale under the eager replication model, the one that provides consistency across replicas. This thesis provides an in depth study of three eager database replication systems based on relational systems: Middle-R, C-JDBC and MySQL Cluster and three systems based on In-Memory Data Grids: JBoss Data Grid, Oracle Coherence and Terracotta Ehcache. Thesis explore these systems based on their architecture, replication protocols, fault tolerance and various other functionalities. It also provides experimental analysis of these systems using state-of-the art benchmarks: TPC-C and TPC-W (for relational systems) and Yahoo! Cloud Serving Benchmark (In- Memory Data Grids). Thesis also discusses three Graph Databases, Neo4j, Titan and Sparksee based on their architecture and transactional capabilities and highlights the weaker transactional consistencies provided by these systems. It discusses an implementation of snapshot isolation in Neo4j graph database to provide stronger isolation guarantees for transactions.
Resumo:
Abstract: Context aware applications, which can adapt their behaviors to changing environments, are attracting more and more attention. To simplify the complexity of developing applications, context aware middleware, which introduces context awareness into the traditional middleware, is highlighted to provide a homogeneous interface involving generic context management solutions. This paper provides a survey of state-of-the-art context aware middleware architectures proposed during the period from 2009 through 2015. First, a preliminary background, such as the principles of context, context awareness, context modelling, and context reasoning, is provided for a comprehensive understanding of context aware middleware. On this basis, an overview of eleven carefully selected middleware architectures is presented and their main features explained. Then, thorough comparisons and analysis of the presented middleware architectures are performed based on technical parameters including architectural style, context abstraction, context reasoning, scalability, fault tolerance, interoperability, service discovery, storage, security & privacy, context awareness level, and cloud-based big data analytics. The analysis shows that there is actually no context aware middleware architecture that complies with all requirements. Finally, challenges are pointed out as open issues for future work.
Resumo:
Distributed real-time embedded systems are becoming increasingly important to society. More demands will be made on them and greater reliance will be placed on the delivery of their services. A relevant subset of them is high-integrity or hard real-time systems, where failure can cause loss of life, environmental harm, or significant financial loss. Additionally, the evolution of communication networks and paradigms as well as the necessity of demanding processing power and fault tolerance, motivated the interconnection between electronic devices; many of the communications have the possibility of transferring data at a high speed. The concept of distributed systems emerged as systems where different parts are executed on several nodes that interact with each other via a communication network. Java’s popularity, facilities and platform independence have made it an interesting language for the real-time and embedded community. This was the motivation for the development of RTSJ (Real-Time Specification for Java), which is a language extension intended to allow the development of real-time systems. The use of Java in the development of high-integrity systems requires strict development and testing techniques. However, RTJS includes a number of language features that are forbidden in such systems. In the context of the HIJA project, the HRTJ (Hard Real-Time Java) profile was developed to define a robust subset of the language that is amenable to static analysis for high-integrity system certification. Currently, a specification under the Java community process (JSR- 302) is being developed. Its purpose is to define those capabilities needed to create safety critical applications with Java technology called Safety Critical Java (SCJ). However, neither RTSJ nor its profiles provide facilities to develop distributed realtime applications. This is an important issue, as most of the current and future systems will be distributed. The Distributed RTSJ (DRTSJ) Expert Group was created under the Java community process (JSR-50) in order to define appropriate abstractions to overcome this problem. Currently there is no formal specification. The aim of this thesis is to develop a communication middleware that is suitable for the development of distributed hard real-time systems in Java, based on the integration between the RMI (Remote Method Invocation) model and the HRTJ profile. It has been designed and implemented keeping in mind the main requirements such as the predictability and reliability in the timing behavior and the resource usage. iThe design starts with the definition of a computational model which identifies among other things: the communication model, most appropriate underlying network protocols, the analysis model, and a subset of Java for hard real-time systems. In the design, the remote references are the basic means for building distributed applications which are associated with all non-functional parameters and resources needed to implement synchronous or asynchronous remote invocations with real-time attributes. The proposed middleware separates the resource allocation from the execution itself by defining two phases and a specific threading mechanism that guarantees a suitable timing behavior. It also includes mechanisms to monitor the functional and the timing behavior. It provides independence from network protocol defining a network interface and modules. The JRMP protocol was modified to include two phases, non-functional parameters, and message size optimizations. Although serialization is one of the fundamental operations to ensure proper data transmission, current implementations are not suitable for hard real-time systems and there are no alternatives. This thesis proposes a predictable serialization that introduces a new compiler to generate optimized code according to the computational model. The proposed solution has the advantage of allowing us to schedule the communications and to adjust the memory usage at compilation time. In order to validate the design and the implementation a demanding validation process was carried out with emphasis in the functional behavior, the memory usage, the processor usage (the end-to-end response time and the response time in each functional block) and the network usage (real consumption according to the calculated consumption). The results obtained in an industrial application developed by Thales Avionics (a Flight Management System) and in exhaustive tests show that the design and the prototype are reliable for industrial applications with strict timing requirements. Los sistemas empotrados y distribuidos de tiempo real son cada vez más importantes para la sociedad. Su demanda aumenta y cada vez más dependemos de los servicios que proporcionan. Los sistemas de alta integridad constituyen un subconjunto de gran importancia. Se caracterizan por que un fallo en su funcionamiento puede causar pérdida de vidas humanas, daños en el medio ambiente o cuantiosas pérdidas económicas. La necesidad de satisfacer requisitos temporales estrictos, hace más complejo su desarrollo. Mientras que los sistemas empotrados se sigan expandiendo en nuestra sociedad, es necesario garantizar un coste de desarrollo ajustado mediante el uso técnicas adecuadas en su diseño, mantenimiento y certificación. En concreto, se requiere una tecnología flexible e independiente del hardware. La evolución de las redes y paradigmas de comunicación, así como la necesidad de mayor potencia de cómputo y de tolerancia a fallos, ha motivado la interconexión de dispositivos electrónicos. Los mecanismos de comunicación permiten la transferencia de datos con alta velocidad de transmisión. En este contexto, el concepto de sistema distribuido ha emergido como sistemas donde sus componentes se ejecutan en varios nodos en paralelo y que interactúan entre ellos mediante redes de comunicaciones. Un concepto interesante son los sistemas de tiempo real neutrales respecto a la plataforma de ejecución. Se caracterizan por la falta de conocimiento de esta plataforma durante su diseño. Esta propiedad es relevante, por que conviene que se ejecuten en la mayor variedad de arquitecturas, tienen una vida media mayor de diez anos y el lugar ˜ donde se ejecutan puede variar. El lenguaje de programación Java es una buena base para el desarrollo de este tipo de sistemas. Por este motivo se ha creado RTSJ (Real-Time Specification for Java), que es una extensión del lenguaje para permitir el desarrollo de sistemas de tiempo real. Sin embargo, RTSJ no proporciona facilidades para el desarrollo de aplicaciones distribuidas de tiempo real. Es una limitación importante dado que la mayoría de los actuales y futuros sistemas serán distribuidos. El grupo DRTSJ (DistributedRTSJ) fue creado bajo el proceso de la comunidad de Java (JSR-50) con el fin de definir las abstracciones que aborden dicha limitación, pero en la actualidad aun no existe una especificacion formal. El objetivo de esta tesis es desarrollar un middleware de comunicaciones para el desarrollo de sistemas distribuidos de tiempo real en Java, basado en la integración entre el modelo de RMI (Remote Method Invocation) y el perfil HRTJ. Ha sido diseñado e implementado teniendo en cuenta los requisitos principales, como la predecibilidad y la confiabilidad del comportamiento temporal y el uso de recursos. El diseño parte de la definición de un modelo computacional el cual identifica entre otras cosas: el modelo de comunicaciones, los protocolos de red subyacentes más adecuados, el modelo de análisis, y un subconjunto de Java para sistemas de tiempo real crítico. En el diseño, las referencias remotas son el medio básico para construcción de aplicaciones distribuidas las cuales son asociadas a todos los parámetros no funcionales y los recursos necesarios para la ejecución de invocaciones remotas síncronas o asíncronas con atributos de tiempo real. El middleware propuesto separa la asignación de recursos de la propia ejecución definiendo dos fases y un mecanismo de hebras especifico que garantiza un comportamiento temporal adecuado. Además se ha incluido mecanismos para supervisar el comportamiento funcional y temporal. Se ha buscado independencia del protocolo de red definiendo una interfaz de red y módulos específicos. También se ha modificado el protocolo JRMP para incluir diferentes fases, parámetros no funcionales y optimizaciones de los tamaños de los mensajes. Aunque la serialización es una de las operaciones fundamentales para asegurar la adecuada transmisión de datos, las actuales implementaciones no son adecuadas para sistemas críticos y no hay alternativas. Este trabajo propone una serialización predecible que ha implicado el desarrollo de un nuevo compilador para la generación de código optimizado acorde al modelo computacional. La solución propuesta tiene la ventaja que en tiempo de compilación nos permite planificar las comunicaciones y ajustar el uso de memoria. Con el objetivo de validar el diseño e implementación se ha llevado a cabo un exigente proceso de validación con énfasis en: el comportamiento funcional, el uso de memoria, el uso del procesador (tiempo de respuesta de extremo a extremo y en cada uno de los bloques funcionales) y el uso de la red (consumo real conforme al estimado). Los buenos resultados obtenidos en una aplicación industrial desarrollada por Thales Avionics (un sistema de gestión de vuelo) y en las pruebas exhaustivas han demostrado que el diseño y el prototipo son fiables para aplicaciones industriales con estrictos requisitos temporales.
Resumo:
Complexity has always been one of the most important issues in distributed computing. From the first clusters to grid and now cloud computing, dealing correctly and efficiently with system complexity is the key to taking technology a step further. In this sense, global behavior modeling is an innovative methodology aimed at understanding the grid behavior. The main objective of this methodology is to synthesize the grid's vast, heterogeneous nature into a simple but powerful behavior model, represented in the form of a single, abstract entity, with a global state. Global behavior modeling has proved to be very useful in effectively managing grid complexity but, in many cases, deeper knowledge is needed. It generates a descriptive model that could be greatly improved if extended not only to explain behavior, but also to predict it. In this paper we present a prediction methodology whose objective is to define the techniques needed to create global behavior prediction models for grid systems. This global behavior prediction can benefit grid management, specially in areas such as fault tolerance or job scheduling. The paper presents experimental results obtained in real scenarios in order to validate this approach.
Consolidation of a wsn and minimax method to rapidly neutralise intruders in strategic installations
Resumo:
Due to the sensitive international situation caused by still-recent terrorist attacks, there is a common need to protect the safety of large spaces such as government buildings, airports and power stations. To address this problem, developments in several research fields, such as video and cognitive audio, decision support systems, human interface, computer architecture, communications networks and communications security, should be integrated with the goal of achieving advanced security systems capable of checking all of the specified requirements and spanning the gap that presently exists in the current market. This paper describes the implementation of a decision system for crisis management in infrastructural building security. Specifically, it describes the implementation of a decision system in the management of building intrusions. The positions of the unidentified persons are reported with the help of a Wireless Sensor Network (WSN). The goal is to achieve an intelligent system capable of making the best decision in real time in order to quickly neutralise one or more intruders who threaten strategic installations. It is assumed that the intruders’ behaviour is inferred through sequences of sensors’ activations and their fusion. This article presents a general approach to selecting the optimum operation from the available neutralisation strategies based on a Minimax algorithm. The distances among different scenario elements will be used to measure the risk of the scene, so a path planning technique will be integrated in order to attain a good performance. Different actions to be executed over the elements of the scene such as moving a guard, blocking a door or turning on an alarm will be used to neutralise the crisis. This set of actions executed to stop the crisis is known as the neutralisation strategy. Finally, the system has been tested in simulations of real situations, and the results have been evaluated according to the final state of the intruders. In 86.5% of the cases, the system achieved the capture of the intruders, and in 59.25% of the cases, they were intercepted before they reached their objective.
Resumo:
Collaborative hardening and hardware redundancy are nowadays the most interesting solutions in terms of fault tolerance achieved and low extra cost imposed to the project budget. Thanks to the powerful and cheap digital devices that are available in the market, extra processing capabilities can be used for redundant tasks, not only in early data processing (sensed data) but also in routing and interfacing1
Resumo:
La Universidad Politécnica de Madrid está investigando en el campo de la robótica inteligente, concretamente con el empleo de vehículos aéreos no tripulados (UAV). El objetivo final que se persigue con las investigaciones en este campo es el desarrollo de sistemas capaces de operar de forma más autónoma en un amplio espectro de situaciones. Dentro de este marco, este trabajo fin de grado se centra en el desarrollo de un sistema de supervisión para UAVs que persigue facilitar la monitorización de la ejecución de los procesos y facilitar la inclusión de procedimientos para incrementar la tolerancia a los fallos software. A lo largo de esta memoria se ofrece una revisión del estado del arte en el ámbito de la robótica, haciendo especial hincapié en la robótica inteligente con los métodos de desarrollo existentes y la definición de los distintos marcos de clasificación de la autonomía. También se ofrece una vista a las distintas técnicas existentes para lograr una mayor tolerancia a los fallos software, de entre las que han sido seleccionadas varias de ellas en la realización de este trabajo. Finalmente se describe el sistema de supervisión desarrollado, explicando primero el sistema desde un punto de vista funcional para más adelante adentrarse en la solución técnica elaborada. ---ABSTRACT--- The Universidad Politécnica de Madrid is currently handling several investigations regarding AI robotics, some of them are actually directing their efforts into the use of unmanned aerial vehicles (UAV). The goal in the long term for this investigations is the accomplishment of systems capable of operating autonomously, regardless of the situation the robot is place at. From this perspective, this final degree project focuses on de design and development of a supervision system for UAV’s, which function is to ease the monitoring of executing processes and the inclusion of fault tolerant procedures. During the development of this document a state of the art revision is offered, in which a thorough description through development methods and autonomy definitions for AI robotics is made. It is also offered a look around the different existing techniques for achieving a greater software fault tolerance, from which some of them were chosen for the development of this project. Finally the developed supervision system is described, first from a pure functional perspective of what the system should do and latter with a description of the actual technical solutions developed for this system.
Resumo:
Abstract is not available
Resumo:
Any new hospital communication architecture has to support existing services, but at the same time new added features should not affect normal tasks. This article deals with issues regarding old and new systems’ interoperability, as well as the effect the human factor has in a deployed architecture. It also presents valuable information, which is a product of a real scenario. Tracking services are also tested in order to monitor and administer several medical resources.
Resumo:
This paper describes new improvements for BB-MaxClique (San Segundo et al. in Comput Oper Resour 38(2):571–581, 2011 ), a leading maximum clique algorithm which uses bit strings to efficiently compute basic operations during search by bit masking. Improvements include a recently described recoloring strategy in Tomita et al. (Proceedings of the 4th International Workshop on Algorithms and Computation. Lecture Notes in Computer Science, vol 5942. Springer, Berlin, pp 191–203, 2010 ), which is now integrated in the bit string framework, as well as different optimization strategies for fast bit scanning. Reported results over DIMACS and random graphs show that the new variants improve over previous BB-MaxClique for a vast majority of cases. It is also established that recoloring is mainly useful for graphs with high densities.
Resumo:
A good and early fault detection and isolation system along with efficient alarm management and fine sensor validation systems are very important in today¿s complex process plants, specially in terms of safety enhancement and costs reduction. This paper presents a methodology for fault characterization. This is a self-learning approach developed in two phases. An initial, learning phase, where the simulation of process units, without and with different faults, will let the system (in an automated way) to detect the key variables that characterize the faults. This will be used in a second (on line) phase, where these key variables will be monitored in order to diagnose possible faults. Using this scheme the faults will be diagnosed and isolated in an early stage where the fault still has not turned into a failure.
Resumo:
A highly parallel and scalable Deblocking Filter (DF) hardware architecture for H.264/AVC and SVC video codecs is presented in this paper. The proposed architecture mainly consists on a coarse grain systolic array obtained by replicating a unique and homogeneous Functional Unit (FU), in which a whole Deblocking-Filter unit is implemented. The proposal is also based on a novel macroblock-level parallelization strategy of the filtering algorithm which improves the final performance by exploiting specific data dependences. This way communication overhead is reduced and a more intensive parallelism in comparison with the existing state-of-the-art solutions is obtained. Furthermore, the architecture is completely flexible, since the level of parallelism can be changed, according to the application requirements. The design has been implemented in a Virtex-5 FPGA, and it allows filtering 4CIF (704 × 576 pixels @30 fps) video sequences in real-time at frequencies lower than 10.16 Mhz.
Resumo:
This article describes a new visual servo control and strategies that are used to carry out dynamic tasks by the Robotenis platform. This platform is basically a parallel robot that is equipped with an acquisition and processing system of visual information, its main feature is that it has a completely open architecture control, and planned in order to design, implement, test and compare control strategies and algorithms (visual and actuated joint controllers). Following sections describe a new visual control strategy specially designed to track and intercept objects in 3D space. The results are compared with a controller shown in previous woks, where the end effector of the robot keeps a constant distance from the tracked object. In this work, the controller is specially designed in order to allow changes in the tracking reference. Changes in the tracking reference can be used to grip an object that is under movement, or as in this case, hitting a hanging Ping-Pong ball. Lyapunov stability is taken into account in the controller design.
Resumo:
Systems relying on fixed hardware components with a static level of parallelism can suffer from an underuse of logical resources, since they have to be designed for the worst-case scenario. This problem is especially important in video applications due to the emergence of new flexible standards, like Scalable Video Coding (SVC), which offer several levels of scalability. In this paper, Dynamic and Partial Reconfiguration (DPR) of modern FPGAs is used to achieve run-time variable parallelism, by using scalable architectures where the size can be adapted at run-time. Based on this proposal, a scalable Deblocking Filter core (DF), compliant with the H.264/AVC and SVC standards has been designed. This scalable DF allows run-time addition or removal of computational units working in parallel. Scalability is offered together with a scalable parallelization strategy at the macroblock (MB) level, such that when the size of the architecture changes, MB filtering order is modified accordingly