41 resultados para Centralized and Distributed Multi-Agent Routing Schemas
Resumo:
Remote reprogramming capabilities are one of the major concerns in WSN platforms due to the limitations and constraints that low power wireless nodes poses, especially when energy efficiency during the reprogramming process is a critical factor for extending the battery life of the devices. Moreover, WSNs are based on low-rate protocols in which as greater the amount of data is sent, the more the possibility to lose packets during the transmitting process is. In order to overcome these limitations, in this work a novel on-the-fly reprogramming technique for modifying and updating the application running on the wireless sensor nodes is designed and implemented, based on a partial reprogramming mechanism that significantly reduces the size of the files to be downloaded to the nodes, therefore diminishing their power/time consumption. This powerful mechanism also addresses multi-experimental capabilities because it provides the possibility to download, manage, test and debug multiple applications into the wireless nodes, based on a memory map segmentation of the core. Being an on-the-fly reprogramming process, no additional resources to store and download the configuration file are needed.
Resumo:
Los avances en el hardware permiten disponer de grandes volúmenes de datos, surgiendo aplicaciones que deben suministrar información en tiempo cuasi-real, la monitorización de pacientes, ej., el seguimiento sanitario de las conducciones de agua, etc. Las necesidades de estas aplicaciones hacen emerger el modelo de flujo de datos (data streaming) frente al modelo almacenar-para-despuésprocesar (store-then-process). Mientras que en el modelo store-then-process, los datos son almacenados para ser posteriormente consultados; en los sistemas de streaming, los datos son procesados a su llegada al sistema, produciendo respuestas continuas sin llegar a almacenarse. Esta nueva visión impone desafíos para el procesamiento de datos al vuelo: 1) las respuestas deben producirse de manera continua cada vez que nuevos datos llegan al sistema; 2) los datos son accedidos solo una vez y, generalmente, no son almacenados en su totalidad; y 3) el tiempo de procesamiento por dato para producir una respuesta debe ser bajo. Aunque existen dos modelos para el cómputo de respuestas continuas, el modelo evolutivo y el de ventana deslizante; éste segundo se ajusta mejor en ciertas aplicaciones al considerar únicamente los datos recibidos más recientemente, en lugar de todo el histórico de datos. En los últimos años, la minería de datos en streaming se ha centrado en el modelo evolutivo. Mientras que, en el modelo de ventana deslizante, el trabajo presentado es más reducido ya que estos algoritmos no sólo deben de ser incrementales si no que deben borrar la información que caduca por el deslizamiento de la ventana manteniendo los anteriores tres desafíos. Una de las tareas fundamentales en minería de datos es la búsqueda de agrupaciones donde, dado un conjunto de datos, el objetivo es encontrar grupos representativos, de manera que se tenga una descripción sintética del conjunto. Estas agrupaciones son fundamentales en aplicaciones como la detección de intrusos en la red o la segmentación de clientes en el marketing y la publicidad. Debido a las cantidades masivas de datos que deben procesarse en este tipo de aplicaciones (millones de eventos por segundo), las soluciones centralizadas puede ser incapaz de hacer frente a las restricciones de tiempo de procesamiento, por lo que deben recurrir a descartar datos durante los picos de carga. Para evitar esta perdida de datos, se impone el procesamiento distribuido de streams, en concreto, los algoritmos de agrupamiento deben ser adaptados para este tipo de entornos, en los que los datos están distribuidos. En streaming, la investigación no solo se centra en el diseño para tareas generales, como la agrupación, sino también en la búsqueda de nuevos enfoques que se adapten mejor a escenarios particulares. Como ejemplo, un mecanismo de agrupación ad-hoc resulta ser más adecuado para la defensa contra la denegación de servicio distribuida (Distributed Denial of Services, DDoS) que el problema tradicional de k-medias. En esta tesis se pretende contribuir en el problema agrupamiento en streaming tanto en entornos centralizados y distribuidos. Hemos diseñado un algoritmo centralizado de clustering mostrando las capacidades para descubrir agrupaciones de alta calidad en bajo tiempo frente a otras soluciones del estado del arte, en una amplia evaluación. Además, se ha trabajado sobre una estructura que reduce notablemente el espacio de memoria necesario, controlando, en todo momento, el error de los cómputos. Nuestro trabajo también proporciona dos protocolos de distribución del cómputo de agrupaciones. Se han analizado dos características fundamentales: el impacto sobre la calidad del clustering al realizar el cómputo distribuido y las condiciones necesarias para la reducción del tiempo de procesamiento frente a la solución centralizada. Finalmente, hemos desarrollado un entorno para la detección de ataques DDoS basado en agrupaciones. En este último caso, se ha caracterizado el tipo de ataques detectados y se ha desarrollado una evaluación sobre la eficiencia y eficacia de la mitigación del impacto del ataque. ABSTRACT Advances in hardware allow to collect huge volumes of data emerging applications that must provide information in near-real time, e.g., patient monitoring, health monitoring of water pipes, etc. The data streaming model emerges to comply with these applications overcoming the traditional store-then-process model. With the store-then-process model, data is stored before being consulted; while, in streaming, data are processed on the fly producing continuous responses. The challenges of streaming for processing data on the fly are the following: 1) responses must be produced continuously whenever new data arrives in the system; 2) data is accessed only once and is generally not maintained in its entirety, and 3) data processing time to produce a response should be low. Two models exist to compute continuous responses: the evolving model and the sliding window model; the latter fits best with applications must be computed over the most recently data rather than all the previous data. In recent years, research in the context of data stream mining has focused mainly on the evolving model. In the sliding window model, the work presented is smaller since these algorithms must be incremental and they must delete the information which expires when the window slides. Clustering is one of the fundamental techniques of data mining and is used to analyze data sets in order to find representative groups that provide a concise description of the data being processed. Clustering is critical in applications such as network intrusion detection or customer segmentation in marketing and advertising. Due to the huge amount of data that must be processed by such applications (up to millions of events per second), centralized solutions are usually unable to cope with timing restrictions and recur to shedding techniques where data is discarded during load peaks. To avoid discarding of data, processing of streams (such as clustering) must be distributed and adapted to environments where information is distributed. In streaming, research does not only focus on designing for general tasks, such as clustering, but also in finding new approaches that fit bests with particular scenarios. As an example, an ad-hoc grouping mechanism turns out to be more adequate than k-means for defense against Distributed Denial of Service (DDoS). This thesis contributes to the data stream mining clustering technique both for centralized and distributed environments. We present a centralized clustering algorithm showing capabilities to discover clusters of high quality in low time and we provide a comparison with existing state of the art solutions. We have worked on a data structure that significantly reduces memory requirements while controlling the error of the clusters statistics. We also provide two distributed clustering protocols. We focus on the analysis of two key features: the impact on the clustering quality when computation is distributed and the requirements for reducing the processing time compared to the centralized solution. Finally, with respect to ad-hoc grouping techniques, we have developed a DDoS detection framework based on clustering.We have characterized the attacks detected and we have evaluated the efficiency and effectiveness of mitigating the attack impact.
Resumo:
Durante los últimos años, el imparable crecimiento de fuentes de datos biomédicas, propiciado por el desarrollo de técnicas de generación de datos masivos (principalmente en el campo de la genómica) y la expansión de tecnologías para la comunicación y compartición de información ha propiciado que la investigación biomédica haya pasado a basarse de forma casi exclusiva en el análisis distribuido de información y en la búsqueda de relaciones entre diferentes fuentes de datos. Esto resulta una tarea compleja debido a la heterogeneidad entre las fuentes de datos empleadas (ya sea por el uso de diferentes formatos, tecnologías, o modelizaciones de dominios). Existen trabajos que tienen como objetivo la homogeneización de estas con el fin de conseguir que la información se muestre de forma integrada, como si fuera una única base de datos. Sin embargo no existe ningún trabajo que automatice de forma completa este proceso de integración semántica. Existen dos enfoques principales para dar solución al problema de integración de fuentes heterogéneas de datos: Centralizado y Distribuido. Ambos enfoques requieren de una traducción de datos de un modelo a otro. Para realizar esta tarea se emplean formalizaciones de las relaciones semánticas entre los modelos subyacentes y el modelo central. Estas formalizaciones se denominan comúnmente anotaciones. Las anotaciones de bases de datos, en el contexto de la integración semántica de la información, consisten en definir relaciones entre términos de igual significado, para posibilitar la traducción automática de la información. Dependiendo del problema en el que se esté trabajando, estas relaciones serán entre conceptos individuales o entre conjuntos enteros de conceptos (vistas). El trabajo aquí expuesto se centra en estas últimas. El proyecto europeo p-medicine (FP7-ICT-2009-270089) se basa en el enfoque centralizado y hace uso de anotaciones basadas en vistas y cuyas bases de datos están modeladas en RDF. Los datos extraídos de las diferentes fuentes son traducidos e integrados en un Data Warehouse. Dentro de la plataforma de p-medicine, el Grupo de Informática Biomédica (GIB) de la Universidad Politécnica de Madrid, en el cuál realicé mi trabajo, proporciona una herramienta para la generación de las necesarias anotaciones de las bases de datos RDF. Esta herramienta, denominada Ontology Annotator ofrece la posibilidad de generar de manera manual anotaciones basadas en vistas. Sin embargo, aunque esta herramienta muestra las fuentes de datos a anotar de manera gráfica, la gran mayoría de usuarios encuentran difícil el manejo de la herramienta , y pierden demasiado tiempo en el proceso de anotación. Es por ello que surge la necesidad de desarrollar una herramienta más avanzada, que sea capaz de asistir al usuario en el proceso de anotar bases de datos en p-medicine. El objetivo es automatizar los procesos más complejos de la anotación y presentar de forma natural y entendible la información relativa a las anotaciones de bases de datos RDF. Esta herramienta ha sido denominada Ontology Annotator Assistant, y el trabajo aquí expuesto describe el proceso de diseño y desarrollo, así como algunos algoritmos innovadores que han sido creados por el autor del trabajo para su correcto funcionamiento. Esta herramienta ofrece funcionalidades no existentes previamente en ninguna otra herramienta del área de la anotación automática e integración semántica de bases de datos. ---ABSTRACT---Over the last years, the unstoppable growth of biomedical data sources, mainly thanks to the development of massive data generation techniques (specially in the genomics field) and the rise of the communication and information sharing technologies, lead to the fact that biomedical research has come to rely almost exclusively on the analysis of distributed information and in finding relationships between different data sources. This is a complex task due to the heterogeneity of the sources used (either by the use of different formats, technologies or domain modeling). There are some research proyects that aim homogenization of these sources in order to retrieve information in an integrated way, as if it were a single database. However there is still now work to automate completely this process of semantic integration. There are two main approaches with the purpouse of integrating heterogeneous data sources: Centralized and Distributed. Both approches involve making translation from one model to another. To perform this task there is a need of using formalization of the semantic relationships between the underlying models and the main model. These formalizations are also calles annotations. In the context of semantic integration of the information, data base annotations consist on defining relations between concepts or words with the same meaning, so the automatic translation can be performed. Depending on the task, the ralationships can be between individuals or between whole sets of concepts (views). This paper focuses on the latter. The European project p-medicine (FP7-ICT-2009-270089) is based on the centralized approach. It uses view based annotations and RDF modeled databases. The data retireved from different data sources is translated and joined into a Data Warehouse. Within the p-medicine platform, the Biomedical Informatics Group (GIB) of the Polytechnic University of Madrid, in which I worked, provides a software to create annotations for the RDF sources. This tool, called Ontology Annotator, is used to create annotations manually. However, although Ontology Annotator displays the data sources graphically, most of the users find it difficult to use this software, thus they spend too much time to complete the task. For this reason there is a need to develop a more advanced tool, which would be able to help the user in the task of annotating p-medicine databases. The aim is automating the most complex processes of the annotation and display the information clearly and easy understanding. This software is called Ontology Annotater Assistant and this book describes the process of design and development of it. as well as some innovative algorithms that were designed by the author of the work. This tool provides features that no other software in the field of automatic annotation can provide.
Resumo:
Distributed parallel execution systems speed up applications by splitting tasks into processes whose execution is assigned to different receiving nodes in a high-bandwidth network. On the distributing side, a fundamental problem is grouping and scheduling such tasks such that each one involves sufñcient computational cost when compared to the task creation and communication costs and other such practical overheads. On the receiving side, an important issue is to have some assurance of the correctness and characteristics of the code received and also of the kind of load the particular task is going to pose, which can be specified by means of certificates. In this paper we present in a tutorial way a number of general solutions to these problems, and illustrate them through their implementation in the Ciao multi-paradigm language and program development environment. This system includes facilities for parallel and distributed execution, an assertion language for specifying complex programs properties (including safety and resource-related properties), and compile-time and run-time tools for performing automated parallelization and resource control, as well as certification of programs with resource consumption assurances and efñcient checking of such certificates.
Resumo:
El principio de Teoría de Juegos permite desarrollar modelos estocásticos de patrullaje multi-robot para proteger infraestructuras criticas. La protección de infraestructuras criticas representa un gran reto para los países al rededor del mundo, principalmente después de los ataques terroristas llevados a cabo la década pasada. En este documento el termino infraestructura hace referencia a aeropuertos, plantas nucleares u otros instalaciones. El problema de patrullaje se define como la actividad de patrullar un entorno determinado para monitorear cualquier actividad o sensar algunas variables ambientales. En esta actividad, un grupo de robots debe visitar un conjunto de puntos de interés definidos en un entorno en intervalos de tiempo irregulares con propósitos de seguridad. Los modelos de partullaje multi-robot son utilizados para resolver este problema. Hasta el momento existen trabajos que resuelven este problema utilizando diversos principios matemáticos. Los modelos de patrullaje multi-robot desarrollados en esos trabajos representan un gran avance en este campo de investigación. Sin embargo, los modelos con los mejores resultados no son viables para aplicaciones de seguridad debido a su naturaleza centralizada y determinista. Esta tesis presenta cinco modelos de patrullaje multi-robot distribuidos e impredecibles basados en modelos matemáticos de aprendizaje de Teoría de Juegos. El objetivo del desarrollo de estos modelos está en resolver los inconvenientes presentes en trabajos preliminares. Con esta finalidad, el problema de patrullaje multi-robot se formuló utilizando conceptos de Teoría de Grafos, en la cual se definieron varios juegos en cada vértice de un grafo. Los modelos de patrullaje multi-robot desarrollados en este trabajo de investigación se han validado y comparado con los mejores modelos disponibles en la literatura. Para llevar a cabo tanto la validación como la comparación se ha utilizado un simulador de patrullaje y un grupo de robots reales. Los resultados experimentales muestran que los modelos de patrullaje desarrollados en este trabajo de investigación trabajan mejor que modelos de trabajos previos en el 80% de 150 casos de estudio. Además de esto, estos modelos cuentan con varias características importantes tales como distribución, robustez, escalabilidad y dinamismo. Los avances logrados con este trabajo de investigación dan evidencia del potencial de Teoría de Juegos para desarrollar modelos de patrullaje útiles para proteger infraestructuras. ABSTRACT Game theory principle allows to developing stochastic multi-robot patrolling models to protect critical infrastructures. Critical infrastructures protection is a great concern for countries around the world, mainly due to terrorist attacks in the last decade. In this document, the term infrastructures includes airports, nuclear power plants, and many other facilities. The patrolling problem is defined as the activity of traversing a given environment to monitoring any activity or sensing some environmental variables If this activity were performed by a fleet of robots, they would have to visit some places of interest of an environment at irregular intervals of time for security purposes. This problem is solved using multi-robot patrolling models. To date, literature works have been solved this problem applying various mathematical principles.The multi-robot patrolling models developed in those works represent great advances in this field. However, the models that obtain the best results are unfeasible for security applications due to their centralized and predictable nature. This thesis presents five distributed and unpredictable multi-robot patrolling models based on mathematical learning models derived from Game Theory. These multi-robot patrolling models aim at overcoming the disadvantages of previous work. To this end, the multi-robot patrolling problem was formulated using concepts of Graph Theory to represent the environment. Several normal-form games were defined at each vertex of a graph in this formulation. The multi-robot patrolling models developed in this research work have been validated and compared with best ranked multi-robot patrolling models in the literature. Both validation and comparison were preformed by using both a patrolling simulator and real robots. Experimental results show that the multirobot patrolling models developed in this research work improve previous ones in as many as 80% of 150 cases of study. Moreover, these multi-robot patrolling models rely on several features to highlight in security applications such as distribution, robustness, scalability, and dynamism. The achievements obtained in this research work validate the potential of Game Theory to develop patrolling models to protect infrastructures.
Resumo:
CIAO is an advanced programming environment supporting Logic and Constraint programming. It offers a simple concurrent kernel on top of which declarative and non-declarative extensions are added via librarles. Librarles are available for supporting the ISOProlog standard, several constraint domains, functional and higher order programming, concurrent and distributed programming, internet programming, and others. The source language allows declaring properties of predicates via assertions, including types and modes. Such properties are checked at compile-time or at run-time. The compiler and system architecture are designed to natively support modular global analysis, with the two objectives of proving properties in assertions and performing program optimizations, including transparently exploiting parallelism in programs. The purpose of this paper is to report on recent progress made in the context of the CIAO system, with special emphasis on the capabilities of the compiler, the techniques used for supporting such capabilities, and the results in the áreas of program analysis and transformation already obtained with the system.
Resumo:
The SESAR (Single European Sky ATM Research) program is an ambitious re-search and development initiative to design the future European air traffic man-agement (ATM) system. The study of the behavior of ATM systems using agent-based modeling and simulation tools can help the development of new methods to improve their performance. This paper presents an overview of existing agent-based approaches in air transportation (paying special attention to the challenges that exist for the design of future ATM systems) and, subsequently, describes a new agent-based approach that we proposed in the CASSIOPEIA project, which was developed according to the goals of the SESAR program. In our approach, we use agent models for different ATM stakeholders, and, in contrast to previous work, our solution models new collaborative decision processes for flow traffic management, it uses an intermediate level of abstraction (useful for simulations at larger scales), and was designed to be a practical tool (open and reusable) for the development of different ATM studies. It was successfully applied in three stud-ies related to the design of future ATM systems in Europe.
Resumo:
Runtime management of distributed information systems is a complex and costly activity. One of the main challenges that must be addressed is obtaining a complete and updated view of all the managed runtime resources. This article presents a monitoring architecture for heterogeneous and distributed information systems. It is composed of two elements: an information model and an agent infrastructure. The model negates the complexity and variability of these systems and enables the abstraction over non-relevant details. The infrastructure uses this information model to monitor and manage the modeled environment, performing and detecting changes in execution time. The agents infrastructure is further detailed and its components and the relationships between them are explained. Moreover, the proposal is validated through a set of agents that instrument the JEE Glassfish application server, paying special attention to support distributed configuration scenarios.
Resumo:
In recent years, applications in domains such as telecommunications, network security or large scale sensor networks showed the limits of the traditional store-then-process paradigm. In this context, Stream Processing Engines emerged as a candidate solution for all these applications demanding for high processing capacity with low processing latency guarantees. With Stream Processing Engines, data streams are not persisted but rather processed on the fly, producing results continuously. Current Stream Processing Engines, either centralized or distributed, do not scale with the input load due to single-node bottlenecks. Moreover, they are based on static configurations that lead to either under or over-provisioning. This Ph.D. thesis discusses StreamCloud, an elastic paralleldistributed stream processing engine that enables for processing of large data stream volumes. Stream- Cloud minimizes the distribution and parallelization overhead introducing novel techniques that split queries into parallel subqueries and allocate them to independent sets of nodes. Moreover, Stream- Cloud elastic and dynamic load balancing protocols enable for effective adjustment of resources depending on the incoming load. Together with the parallelization and elasticity techniques, Stream- Cloud defines a novel fault tolerance protocol that introduces minimal overhead while providing fast recovery. StreamCloud has been fully implemented and evaluated using several real word applications such as fraud detection applications or network analysis applications. The evaluation, conducted using a cluster with more than 300 cores, demonstrates the large scalability, the elasticity and fault tolerance effectiveness of StreamCloud. Resumen En los útimos años, aplicaciones en dominios tales como telecomunicaciones, seguridad de redes y redes de sensores de gran escala se han encontrado con múltiples limitaciones en el paradigma tradicional de bases de datos. En este contexto, los sistemas de procesamiento de flujos de datos han emergido como solución a estas aplicaciones que demandan una alta capacidad de procesamiento con una baja latencia. En los sistemas de procesamiento de flujos de datos, los datos no se persisten y luego se procesan, en su lugar los datos son procesados al vuelo en memoria produciendo resultados de forma continua. Los actuales sistemas de procesamiento de flujos de datos, tanto los centralizados, como los distribuidos, no escalan respecto a la carga de entrada del sistema debido a un cuello de botella producido por la concentración de flujos de datos completos en nodos individuales. Por otra parte, éstos están basados en configuraciones estáticas lo que conducen a un sobre o bajo aprovisionamiento. Esta tesis doctoral presenta StreamCloud, un sistema elástico paralelo-distribuido para el procesamiento de flujos de datos que es capaz de procesar grandes volúmenes de datos. StreamCloud minimiza el coste de distribución y paralelización por medio de una técnica novedosa la cual particiona las queries en subqueries paralelas repartiéndolas en subconjuntos de nodos independientes. Ademas, Stream- Cloud posee protocolos de elasticidad y equilibrado de carga que permiten una optimización de los recursos dependiendo de la carga del sistema. Unidos a los protocolos de paralelización y elasticidad, StreamCloud define un protocolo de tolerancia a fallos que introduce un coste mínimo mientras que proporciona una rápida recuperación. StreamCloud ha sido implementado y evaluado mediante varias aplicaciones del mundo real tales como aplicaciones de detección de fraude o aplicaciones de análisis del tráfico de red. La evaluación ha sido realizada en un cluster con más de 300 núcleos, demostrando la alta escalabilidad y la efectividad tanto de la elasticidad, como de la tolerancia a fallos de StreamCloud.
Resumo:
This paper proposes a new multi-objective estimation of distribution algorithm (EDA) based on joint modeling of objectives and variables. This EDA uses the multi-dimensional Bayesian network as its probabilistic model. In this way it can capture the dependencies between objectives, variables and objectives, as well as the dependencies learnt between variables in other Bayesian network-based EDAs. This model leads to a problem decomposition that helps the proposed algorithm to find better trade-off solutions to the multi-objective problem. In addition to Pareto set approximation, the algorithm is also able to estimate the structure of the multi-objective problem. To apply the algorithm to many-objective problems, the algorithm includes four different ranking methods proposed in the literature for this purpose. The algorithm is applied to the set of walking fish group (WFG) problems, and its optimization performance is compared with an evolutionary algorithm and another multi-objective EDA. The experimental results show that the proposed algorithm performs significantly better on many of the problems and for different objective space dimensions, and achieves comparable results on some compared with the other algorithms.
Resumo:
Abstract This work is focused on the problem of performing multi‐robot patrolling for infrastructure security applications in order to protect a known environment at critical facilities. Thus, given a set of robots and a set of points of interest, the patrolling task consists of constantly visiting these points at irregular time intervals for security purposes. Current existing solutions for these types of applications are predictable and inflexible. Moreover, most of the previous centralized and deterministic solutions and only few efforts have been made to integrate dynamic methods. Therefore, the development of new dynamic and decentralized collaborative approaches in order to solve the aforementioned problem by implementing learning models from Game Theory. The model selected in this work that includes belief‐based and reinforcement models as special cases is called Experience‐Weighted Attraction. The problem has been defined using concepts of Graph Theory to represent the environment in order to work with such Game Theory techniques. Finally, the proposed methods have been evaluated experimentally by using a patrolling simulator. The results obtained have been compared with previous available
Resumo:
There are many industries that use highly technological solutions to improve quality in all of their products. The steel industry is one example. Several automatic surface-inspection systems are used in the steel industry to identify various types of defects and to help operators decide whether to accept, reroute, or downgrade the material, subject to the assessment process. This paper focuses on promoting a strategy that considers all defects in an integrated fashion. It does this by managing the uncertainty about the exact position of a defect due to different process conditions by means of Gaussian additive influence functions. The relevance of the approach is in making possible consistency and reliability between surface inspection systems. The results obtained are an increase in confidence in the automatic inspection system and an ability to introduce improved prediction and advanced routing models. The prediction is provided to technical operators to help them in their decision-making process. It shows the increase in improvement gained by reducing the 40 % of coils that are downgraded at the hot strip mill because of specific defects. In addition, this technology facilitates an increase of 50 % in the accuracy of the estimate of defect survival after the cleaning facility in comparison to the former approach. The proposed technology is implemented by means of software-based, multi-agent solutions. It makes possible the independent treatment of information, presentation, quality analysis, and other relevant functions.
Resumo:
Wireless sensor networks are posed as the new communication paradigm where the use of small, low-complexity, and low-power devices is preferred over costly centralized systems. The spectra of potential applications of sensor networks is very wide, ranging from monitoring, surveillance, and localization, among others. Localization is a key application in sensor networks and the use of simple, efficient, and distributed algorithms is of paramount practical importance. Combining convex optimization tools with consensus algorithms we propose a distributed localization algorithm for scenarios where received signal strength indicator readings are used. We approach the localization problem by formulating an alternative problem that uses distance estimates locally computed at each node. The formulated problem is solved by a relaxed version using semidefinite relaxation technique. Conditions under which the relaxed problem yields to the same solution as the original problem are given and a distributed consensusbased implementation of the algorithm is proposed based on an augmented Lagrangian approach and primaldual decomposition methods. Although suboptimal, the proposed approach is very suitable for its implementation in real sensor networks, i.e., it is scalable, robust against node failures and requires only local communication among neighboring nodes. Simulation results show that running an additional local search around the found solution can yield performance close to the maximum likelihood estimate.
Resumo:
Despite that Critical Infrastructures (CIs) security and surveillance are a growing concern for many countries and companies, Multi Robot Systems (MRSs) have not been yet broadly used in this type of facilities. This dissertation presents a novel study of the challenges arisen by the implementation of this type of systems and proposes solutions to specific problems. First, a comprehensive analysis of different types of CIs has been carried out, emphasizing the influence of the different characteristics of the facilities in the design of a security and surveillance MRS. One of the most important needs for the surveillance of a CI is the detection of intruders. From a technical point of view this problem can be abstracted as equivalent to the Detection and Tracking of Mobile Objects (DATMO). This dissertation proposes algorithms to solve this specific problem in a CI environment. Using 3D range images of the environment as input data, two detection algorithms for ground robots have been developed. These detection algorithms provide a list of moving objects in the robot detection area. Direct image differentiation and computer vision techniques are used when the robot is static. Alternatively, multi-layer ground reconstructions are compared to detect the dynamic objects when the robot is moving. Since CIs usually spread over large areas, it is very useful to incorporate aerial vehicles in the surveillance MRS. Therefore, a moving object detection algorithm for aerial vehicles has been also developed. This algorithm compares the real optical flow obtained from a down-face oriented camera with an artificial optical flow computed using a RANSAC based homography matrix. Two tracking algorithms have been developed to follow the moving objects trajectories. These algorithms can efficiently handle occlusions and crossings, as well as exchange information among robots. The multirobot tracking can be applied to any type of communication structure: centralized, decentralized or a combination of both. Even more, the developed tracking algorithms are independent of the detection algorithms and could be potentially used with other detection procedures or even with static sensors, such as cameras. In addition, using the 3D point clouds available to the robots, a relative localization algorithm has been developed to improve the position estimation of a given robot with observations from other robots. All the developed algorithms have been extensively tested in different simulated CIs using the Webots robotics simulator. Furthermore, the algorithms have also been validated with real robots operating in real scenarios. In conclusion, this dissertation presents a multirobot approach to Critical Infrastructure Surveillance, mainly focusing on Detecting and Tracking Dynamic Objects.
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.