882 resultados para reinforcement learning,cryptography,machine learning,deep learning,Deep Q-Learning (DQN),AES
Resumo:
The use of wireless sensor and actuator networks in industry has been increasing past few years, bringing multiple benefits compared to wired systems, like network flexibility and manageability. Such networks consists of a possibly large number of small and autonomous sensor and actuator devices with wireless communication capabilities. The data collected by sensors are sent directly or through intermediary nodes along the network to a base station called sink node. The data routing in this environment is an essential matter since it is strictly bounded to the energy efficiency, thus the network lifetime. This work investigates the application of a routing technique based on Reinforcement Learning s Q-Learning algorithm to a wireless sensor network by using an NS-2 simulated environment. Several metrics like energy consumption, data packet delivery rates and delays are used to validate de proposal comparing it with another solutions existing in the literature
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
A novel adaptive approach for glucose control in individuals with type 1 diabetes under sensor-augmented pump therapy is proposed. The controller, is based on Actor-Critic (AC) learning and is inspired by the principles of reinforcement learning and optimal control theory. The main characteristics of the proposed controller are (i) simultaneous adjustment of both the insulin basal rate and the bolus dose, (ii) initialization based on clinical procedures, and (iii) real-time personalization. The effectiveness of the proposed algorithm in terms of glycemic control has been investigated in silico in adults, adolescents and children under open-loop and closed-loop approaches, using announced meals with uncertainties in the order of ±25% in the estimation of carbohydrates. The results show that glucose regulation is efficient in all three groups of patients, even with uncertainties in the level of carbohydrates in the meal. The percentages in the A+B zones of the Control Variability Grid Analysis (CVGA) were 100% for adults, and 93% for both adolescents and children. The AC based controller seems to be a promising approach for the automatic adjustment of insulin infusion in order to improve glycemic control. After optimization of the algorithm, the controller will be tested in a clinical trial.
Resumo:
Dynamic systems, especially in real-life applications, are often determined by inter-/intra-variability, uncertainties and time-varying components. Physiological systems are probably the most representative example in which population variability, vital signal measurement noise and uncertain dynamics render their explicit representation and optimization a rather difficult task. Systems characterized by such challenges often require the use of adaptive algorithmic solutions able to perform an iterative structural and/or parametrical update process towards optimized behavior. Adaptive optimization presents the advantages of (i) individualization through learning of basic system characteristics, (ii) ability to follow time-varying dynamics and (iii) low computational cost. In this chapter, the use of online adaptive algorithms is investigated in two basic research areas related to diabetes management: (i) real-time glucose regulation and (ii) real-time prediction of hypo-/hyperglycemia. The applicability of these methods is illustrated through the design and development of an adaptive glucose control algorithm based on reinforcement learning and optimal control and an adaptive, personalized early-warning system for the recognition and alarm generation against hypo- and hyperglycemic events.
Resumo:
The Go game is ancient very complex game with simple rules which still is a challenge for the AI.This work cover some neuroevolution techniques used in reinforcement learning applied to the GO game as SANE (Symbiotic Adaptive Neuro-Evolution) and presents a variation to this method with the intention of evolving better strategies in the game. The computer Go player based in SANE is evolved againts a knowed player which creates some problem as determinism for which is proposed the co-evolution. Finally, it is introduced an algorithm to co-evolve two populations of neurons to evolve better computer Go players.
Resumo:
Hoy en día, el desarrollo tecnológico en el campo de los sistemas inteligentes de transporte (ITS por sus siglas en inglés) ha permitido dotar a los vehículos con diversos sistemas de ayuda a la conducción (ADAS, del inglés advanced driver assistance system), mejorando la experiencia y seguridad de los pasajeros, en especial del conductor. La mayor parte de estos sistemas están pensados para advertir al conductor sobre ciertas situaciones de riesgo, como la salida involuntaria del carril o la proximidad de obstáculos en el camino. No obstante, también podemos encontrar sistemas que van un paso más allá y son capaces de cooperar con el conductor en el control del vehículo o incluso relegarlos de algunas tareas tediosas. Es en este último grupo donde se encuentran los sistemas de control electrónico de estabilidad (ESP - Electronic Stability Program), el antibloqueo de frenos (ABS - Anti-lock Braking System), el control de crucero (CC - Cruise Control) y los más recientes sistemas de aparcamiento asistido. Continuando con esta línea de desarrollo, el paso siguiente consiste en la supresión del conductor humano, desarrollando sistemas que sean capaces de conducir un vehículo de forma autónoma y con un rendimiento superior al del conductor. En este trabajo se presenta, en primer lugar, una arquitectura de control para la automatización de vehículos. Esta se compone de distintos componentes de hardware y software, agrupados de acuerdo a su función principal. El diseño de la arquitectura parte del trabajo previo desarrollado por el Programa AUTOPIA, aunque introduce notables aportaciones en cuanto a la eficiencia, robustez y escalabilidad del sistema. Ahondando un poco más en detalle, debemos resaltar el desarrollo de un algoritmo de localización basado en enjambres de partículas. Este está planteado como un método de filtrado y fusión de la información obtenida a partir de los distintos sensores embarcados en el vehículo, entre los que encontramos un receptor GPS (Global Positioning System), unidades de medición inercial (IMU – Inertial Measurement Unit) e información tomada directamente de los sensores embarcados por el fabricante, como la velocidad de las ruedas y posición del volante. Gracias a este método se ha conseguido resolver el problema de la localización, indispensable para el desarrollo de sistemas de conducción autónoma. Continuando con el trabajo de investigación, se ha estudiado la viabilidad de la aplicación de técnicas de aprendizaje y adaptación al diseño de controladores para el vehículo. Como punto de partida se emplea el método de Q-learning para la generación de un controlador borroso lateral sin ningún tipo de conocimiento previo. Posteriormente se presenta un método de ajuste on-line para la adaptación del control longitudinal ante perturbaciones impredecibles del entorno, como lo son los cambios en la inclinación del camino, fricción de las ruedas o peso de los ocupantes. Para finalizar, se presentan los resultados obtenidos durante un experimento de conducción autónoma en carreteras reales, el cual se llevó a cabo en el mes de Junio de 2012 desde la población de San Lorenzo de El Escorial hasta las instalaciones del Centro de Automática y Robótica (CAR) en Arganda del Rey. El principal objetivo tras esta demostración fue validar el funcionamiento, robustez y capacidad de la arquitectura propuesta para afrontar el problema de la conducción autónoma, bajo condiciones mucho más reales a las que se pueden alcanzar en las instalaciones de prueba. ABSTRACT Nowadays, the technological advances in the Intelligent Transportation Systems (ITS) field have led the development of several driving assistance systems (ADAS). These solutions are designed to improve the experience and security of all the passengers, especially the driver. For most of these systems, the main goal is to warn drivers about unexpected circumstances leading to risk situations such as involuntary lane departure or proximity to other vehicles. However, other ADAS go a step further, being able to cooperate with the driver in the control of the vehicle, or even overriding it on some tasks. Examples of this kind of systems are the anti-lock braking system (ABS), cruise control (CC) and the recently commercialised assisted parking systems. Within this research line, the next step is the development of systems able to replace the human drivers, improving the control and therefore, the safety and reliability of the vehicles. First of all, this dissertation presents a control architecture design for autonomous driving. It is made up of several hardware and software components, grouped according to their main function. The design of this architecture is based on the previous works carried out by the AUTOPIA Program, although notable improvements have been made regarding the efficiency, robustness and scalability of the system. It is also remarkable the work made on the development of a location algorithm for vehicles. The proposal is based on the emulation of the behaviour of biological swarms and its performance is similar to the well-known particle filters. The developed method combines information obtained from different sensors, including GPS, inertial measurement unit (IMU), and data from the original vehicle’s sensors on-board. Through this filtering algorithm the localization problem is properly managed, which is critical for the development of autonomous driving systems. The work deals also with the fuzzy control tuning system, a very time consuming task when done manually. An analysis of learning and adaptation techniques for the development of different controllers has been made. First, the Q-learning –a reinforcement learning method– has been applied to the generation of a lateral fuzzy controller from scratch. Subsequently, the development of an adaptation method for longitudinal control is presented. With this proposal, a final cruise control controller is able to deal with unpredictable environment disturbances, such as road slope, wheel’s friction or even occupants’ weight. As a testbed for the system, an autonomous driving experiment on real roads is presented. This experiment was carried out on June 2012, driving from San Lorenzo de El Escorial up to the Center for Automation and Robotics (CAR) facilities in Arganda del Rey. The main goal of the demonstration was validating the performance, robustness and viability of the proposed architecture to deal with the problem of autonomous driving under more demanding conditions than those achieved on closed test tracks.
Resumo:
We apply diffusion strategies to propose a cooperative reinforcement learning algorithm, in which agents in a network communicate with their neighbors to improve predictions about their environment. The algorithm is suitable to learn off-policy even in large state spaces. We provide a mean-square-error performance analysis under constant step-sizes. The gain of cooperation in the form of more stability and less bias and variance in the prediction error, is illustrated in the context of a classical model. We show that the improvement in performance is especially significant when the behavior policy of the agents is different from the target policy under evaluation.
Resumo:
La idea de dotar a un grupo de robots o agentes artificiales de un lenguaje ha sido objeto de intenso estudio en las ultimas décadas. Como no podía ser de otra forma los primeros intentos se enfocaron hacia el estudio de la emergencia de vocabularios compartidos convencionalmente por el grupo de robots. Las ventajas que puede ofrecer un léxico común son evidentes, como también lo es que un lenguaje con una estructura más compleja, en la que se pudieran combinar palabras, sería todavía más beneficioso. Surgen así algunas propuestas enfocadas hacia la emergencia de un lenguaje consensuado que muestre una estructura sintáctica similar al lenguaje humano, entre las que se encuentra este trabajo. Tomar el lenguaje humano como modelo supone adoptar algunas de las hipótesis y teorías que disciplinas como la filosofía, la psicología o la lingüística entre otras se han encargado de proponer. Según estas aproximaciones teóricas el lenguaje presenta una doble dimension formal y funcional. En base a su dimensión formal parece claro que el lenguaje sigue unas reglas, por lo que el uso de una gramática se ha considerado esencial para su representación, pero también porque las gramáticas son un dispositivo muy sencillo y potente que permite generar fácilmente estructuras simbólicas. En cuanto a la dimension funcional se ha tenido en cuenta la teoría quizá más influyente de los últimos tiempos, que no es otra que la Teoría de los Actos del Habla. Esta teoría se basa en la idea de Wittgenstein por la que el significado reside en el uso del lenguaje, hasta el punto de que éste se entiende como una manera de actuar y de comportarse, en definitiva como una forma de vida. Teniendo presentes estas premisas en esta tesis se pretende experimentar con modelos computacionales que permitan a un grupo de robots alcanzar un lenguaje común de manera autónoma, simplemente mediante interacciones individuales entre los robots, en forma de juegos de lenguaje. Para ello se proponen tres modelos distintos de lenguaje: • Un modelo basado en gramáticas probabilísticas y aprendizaje por refuerzo en el que las interacciones y el uso del lenguaje son claves para su emergencia y que emplea una gramática generativa estática y diseñada de antemano. Este modelo se aplica a dos grupos distintos: uno formado exclusivamente por robots y otro que combina robots y un humano, de manera que en este segundo caso se plantea un aprendizaje supervisado por humanos. • Un modelo basado en evolución gramatical que permite estudiar no solo el consenso sintáctico, sino también cuestiones relativas a la génesis del lenguaje y que emplea una gramática universal a partir de la cual los robots pueden evolucionar por sí mismos la gramática más apropiada según la situación lingüística que traten en cada momento. • Un modelo basado en evolución gramatical y aprendizaje por refuerzo que toma aspectos de los anteriores y amplia las posibilidades de los robots al permitir desarrollar un lenguaje que se adapta a situaciones lingüísticas dinámicas que pueden cambiar en el tiempo y también posibilita la imposición de restricciones de orden muy frecuentes en las estructuras sintácticas complejas. Todos los modelos implican un planteamiento descentralizado y auto-organizado, de manera que ninguno de los robots es el dueño del lenguaje y todos deben cooperar y colaborar de forma coordinada para lograr el consenso sintáctico. En cada caso se plantean experimentos que tienen como objetivo validar los modelos propuestos, tanto en lo relativo al éxito en la emergencia del lenguaje como en lo relacionado con cuestiones paralelas de importancia, como la interacción hombre-máquina o la propia génesis del lenguaje. ABSTRACT The idea of giving a language to a group of robots or artificial agents has been the subject of intense study in recent decades. The first attempts have focused on the development and emergence of a conventionally shared vocabulary. The advantages that can provide a common vocabulary are evident and therefore a more complex language that combines words would be even more beneficial. Thus some proposals are put forward towards the emergence of a consensual language with a sintactical structure in similar terms to the human language. This work follows this trend. Taking the human language as a model means taking some of the assumptions and theories that disciplines such as philosophy, psychology or linguistics among others have provided. According to these theoretical positions language has a double formal and functional dimension. Based on its formal dimension it seems clear that language follows rules, so that the use of a grammar has been considered essential for representation, but also because grammars are a very simple and powerful device that easily generates these symbolic structures. As for the functional dimension perhaps the most influential theory of recent times, the Theory of Speech Acts has been taken into account. This theory is based on the Wittgenstein’s idea about that the meaning lies in the use of language, to the extent that it is understood as a way of acting and behaving. Having into account these issues this work implements some computational models in order to test if they allow a group of robots to reach in an autonomous way a shared language by means of individual interaction among them, that is by means of language games. Specifically, three different models of language for robots are proposed: • A reinforcement learning based model in which interactions and language use are key to its emergence. This model uses a static probabilistic generative grammar which is designed beforehand. The model is applied to two different groups: one formed exclusively by robots and other combining robots and a human. Therefore, in the second case the learning process is supervised by the human. • A model based on grammatical evolution that allows us to study not only the syntactic consensus, but also the very genesis of language. This model uses a universal grammar that allows robots to evolve for themselves the most appropriate grammar according to the current linguistic situation they deal with. • A model based on grammatical evolution and reinforcement learning that takes aspects of the previous models and increases their possibilities. This model allows robots to develop a language in order to adapt to dynamic language situations that can change over time and also allows the imposition of syntactical order restrictions which are very common in complex syntactic structures. All models involve a decentralized and self-organized approach so that none of the robots is the language’s owner and everyone must cooperate and work together in a coordinated manner to achieve syntactic consensus. In each case experiments are presented in order to validate the proposed models, both in terms of success about the emergence of language and it relates to the study of important parallel issues, such as human-computer interaction or the very genesis of language.
Resumo:
This paper presents a novel method for enabling a robot to determine the direction to a sound source through interacting with its environment. The method uses a new neural network, the Parameter-Less Self-Organizing Map algorithm, and reinforcement learning to achieve rapid and accurate response.
Resumo:
This paper reviews some basic issues and methods involved in using neural networks to respond in a desired fashion to a temporally-varying environment. Some popular network models and training methods are introduced. A speech recognition example is then used to illustrate the central difficulty of temporal data processing: learning to notice and remember relevant contextual information. Feedforward network methods are applicable to cases where this problem is not severe. The application of these methods are explained and applications are discussed in the areas of pure mathematics, chemical and physical systems, and economic systems. A more powerful but less practical algorithm for temporal problems, the moving targets algorithm, is sketched and discussed. For completeness, a few remarks are made on reinforcement learning.
Resumo:
The problem of evaluating different learning rules and other statistical estimators is analysed. A new general theory of statistical inference is developed by combining Bayesian decision theory with information geometry. It is coherent and invariant. For each sample a unique ideal estimate exists and is given by an average over the posterior. An optimal estimate within a model is given by a projection of the ideal estimate. The ideal estimate is a sufficient statistic of the posterior, so practical learning rules are functions of the ideal estimator. If the sole purpose of learning is to extract information from the data, the learning rule must also approximate the ideal estimator. This framework is applicable to both Bayesian and non-Bayesian methods, with arbitrary statistical models, and to supervised, unsupervised and reinforcement learning schemes.
Resumo:
Dynamic Optimization Problems (DOPs) have been widely studied using Evolutionary Algorithms (EAs). Yet, a clear and rigorous definition of DOPs is lacking in the Evolutionary Dynamic Optimization (EDO) community. In this paper, we propose a unified definition of DOPs based on the idea of multiple-decision-making discussed in the Reinforcement Learning (RL) community. We draw a connection between EDO and RL by arguing that both of them are studying DOPs according to our definition of DOPs. We point out that existing EDO or RL research has been mainly focused on some types of DOPs. A conceptualized benchmark problem, which is aimed at the systematic study of various DOPs, is then developed. Some interesting experimental studies on the benchmark reveal that EDO and RL methods are specialized in certain types of DOPs and more importantly new algorithms for DOPs can be developed by combining the strength of both EDO and RL methods.
Resumo:
This paper introduces a new technique for optimizing the trading strategy of brokers that autonomously trade in re- tail and wholesale markets. Simultaneous optimization of re- tail and wholesale strategies has been considered by existing studies as intractable. Therefore, each of these strategies is optimized separately and their interdependence is generally ignored, with resulting broker agents not aiming for a glob- ally optimal retail and wholesale strategy. In this paper, we propose a novel formalization, based on a semi-Markov deci- sion process (SMDP), which globally and simultaneously op- timizes retail and wholesale strategies. The SMDP is solved using hierarchical reinforcement learning (HRL) in multi- agent environments. To address the curse of dimensionality, which arises when applying SMDP and HRL to complex de- cision problems, we propose an ecient knowledge transfer approach. This enables the reuse of learned trading skills in order to speed up the learning in new markets, at the same time as making the broker transportable across market envi- ronments. The proposed SMDP-broker has been thoroughly evaluated in two well-established multi-agent simulation en- vironments within the Trading Agent Competition (TAC) community. Analysis of controlled experiments shows that this broker can outperform the top TAC-brokers. More- over, our broker is able to perform well in a wide range of environments by re-using knowledge acquired in previously experienced settings.
Resumo:
Smart grid technologies have given rise to a liberalised and decentralised electricity market, enabling energy providers and retailers to have a better understanding of the demand side and its response to pricing signals. This paper puts forward a reinforcement-learning-powered tool aiding an electricity retailer to define the tariff prices it offers, in a bid to optimise its retail strategy. In a competitive market, an energy retailer aims to simultaneously increase the number of contracted customers and its profit margin. We have abstracted the problem of deciding on a tariff price as faced by a retailer, as a semi-Markov decision problem (SMDP). A hierarchical reinforcement learning approach, MaxQ value function decomposition, is applied to solve the SMDP through interactions with the market. To evaluate our trading strategy, we developed a retailer agent (termed AstonTAC) that uses the proposed SMDP framework to act in an open multi-agent simulation environment, the Power Trading Agent Competition (Power TAC). An evaluation and analysis of the 2013 Power TAC finals show that AstonTAC successfully selects sell prices that attract as many customers as necessary to maximise the profit margin. Moreover, during the competition, AstonTAC was the only retailer agent performing well across all retail market settings.
Resumo:
There are a great deal of approaches in artificial intelligence, some of them also coming from biology and neirophysiology. In this paper we are making a review, discussing many of them, and arranging our discussion around the autonomous agent research. We highlight three aspect in our classification: type of abstraction applied for representing agent knowledge, the implementation of hypothesis processing mechanism, allowed degree of freedom in behaviour and self-organizing. Using this classification many approaches in artificial intelligence are evaluated. Then we summarize all discussed ideas and propose a series of general principles for building an autonomous adaptive agent.