984 resultados para Reinforcement-Learning


Relevância:

100.00% 100.00%

Publicador:

Resumo:

On-line learning methods have been applied successfully in multi-agent systems to achieve coordination among agents. Learning in multi-agent systems implies in a non-stationary scenario perceived by the agents, since the behavior of other agents may change as they simultaneously learn how to improve their actions. Non-stationary scenarios can be modeled as Markov Games, which can be solved using the Minimax-Q algorithm a combination of Q-learning (a Reinforcement Learning (RL) algorithm which directly learns an optimal control policy) and the Minimax algorithm. However, finding optimal control policies using any RL algorithm (Q-learning and Minimax-Q included) can be very time consuming. Trying to improve the learning time of Q-learning, we considered the QS-algorithm. in which a single experience can update more than a single action value by using a spreading function. In this paper, we contribute a Minimax-QS algorithm which combines the Minimax-Q algorithm and the QS-algorithm. We conduct a series of empirical evaluation of the algorithm in a simplified simulator of the soccer domain. We show that even using a very simple domain-dependent spreading function, the performance of the learning algorithm can be improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Shared attention is a type of communication very important among human beings. It is sometimes reserved for the more complex form of communication being constituted by a sequence of four steps: mutual gaze, gaze following, imperative pointing and declarative pointing. Some approaches have been proposed in Human-Robot Interaction area to solve part of shared attention process, that is, the most of works proposed try to solve the first two steps. Models based on temporal difference, neural networks, probabilistic and reinforcement learning are methods used in several works. In this article, we are presenting a robotic architecture that provides a robot or agent, the capacity of learning mutual gaze, gaze following and declarative pointing using a robotic head interacting with a caregiver. Three learning methods have been incorporated to this architecture and a comparison of their performance has been done to find the most adequate to be used in real experiment. The learning capabilities of this architecture have been analyzed by observing the robot interacting with the human in a controlled environment. The experimental results show that the robotic head is able to produce appropriate behavior and to learn from sociable interaction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper aims to provide an improved NSGA-II (Non-Dominated Sorting Genetic Algorithm-version II) which incorporates a parameter-free self-tuning approach by reinforcement learning technique, called Non-Dominated Sorting Genetic Algorithm Based on Reinforcement Learning (NSGA-RL). The proposed method is particularly compared with the classical NSGA-II when applied to a satellite coverage problem. Furthermore, not only the optimization results are compared with results obtained by other multiobjective optimization methods, but also guarantee the advantage of no time-spending and complex parameter tuning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Die vorliegende Arbeit beschäftigt sich mit der Entwicklung eines Funktionsapproximators und dessen Verwendung in Verfahren zum Lernen von diskreten und kontinuierlichen Aktionen: 1. Ein allgemeiner Funktionsapproximator – Locally Weighted Interpolating Growing Neural Gas (LWIGNG) – wird auf Basis eines Wachsenden Neuralen Gases (GNG) entwickelt. Die topologische Nachbarschaft in der Neuronenstruktur wird verwendet, um zwischen benachbarten Neuronen zu interpolieren und durch lokale Gewichtung die Approximation zu berechnen. Die Leistungsfähigkeit des Ansatzes, insbesondere in Hinsicht auf sich verändernde Zielfunktionen und sich verändernde Eingabeverteilungen, wird in verschiedenen Experimenten unter Beweis gestellt. 2. Zum Lernen diskreter Aktionen wird das LWIGNG-Verfahren mit Q-Learning zur Q-LWIGNG-Methode verbunden. Dafür muss der zugrunde liegende GNG-Algorithmus abgeändert werden, da die Eingabedaten beim Aktionenlernen eine bestimmte Reihenfolge haben. Q-LWIGNG erzielt sehr gute Ergebnisse beim Stabbalance- und beim Mountain-Car-Problem und gute Ergebnisse beim Acrobot-Problem. 3. Zum Lernen kontinuierlicher Aktionen wird ein REINFORCE-Algorithmus mit LWIGNG zur ReinforceGNG-Methode verbunden. Dabei wird eine Actor-Critic-Architektur eingesetzt, um aus zeitverzögerten Belohnungen zu lernen. LWIGNG approximiert sowohl die Zustands-Wertefunktion als auch die Politik, die in Form von situationsabhängigen Parametern einer Normalverteilung repräsentiert wird. ReinforceGNG wird erfolgreich zum Lernen von Bewegungen für einen simulierten 2-rädrigen Roboter eingesetzt, der einen rollenden Ball unter bestimmten Bedingungen abfangen soll.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The discovery of binary dendritic events such as local NMDA spikes in dendritic subbranches led to the suggestion that dendritic trees could be computationally equivalent to a 2-layer network of point neurons, with a single output unit represented by the soma, and input units represented by the dendritic branches. Although this interpretation endows a neuron with a high computational power, it is functionally not clear why nature would have preferred the dendritic solution with a single but complex neuron, as opposed to the network solution with many but simple units. We show that the dendritic solution has a distinguished advantage over the network solution when considering different learning tasks. Its key property is that the dendritic branches receive an immediate feedback from the somatic output spike, while in the corresponding network architecture the feedback would require additional backpropagating connections to the input units. Assuming a reinforcement learning scenario we formally derive a learning rule for the synaptic contacts on the individual dendritic trees which depends on the presynaptic activity, the local NMDA spikes, the somatic action potential, and a delayed reinforcement signal. We test the model for two scenarios: the learning of binary classifications and of precise spike timings. We show that the immediate feedback represented by the backpropagating action potential supplies the individual dendritic branches with enough information to efficiently adapt their synapses and to speed up the learning process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The discovery of binary dendritic events such as local NMDA spikes in dendritic subbranches led to the suggestion that dendritic trees could be computationally equivalent to a 2-layer network of point neurons, with a single output unit represented by the soma, and input units represented by the dendritic branches. Although this interpretation endows a neuron with a high computational power, it is functionally not clear why nature would have preferred the dendritic solution with a single but complex neuron, as opposed to the network solution with many but simple units. We show that the dendritic solution has a distinguished advantage over the network solution when considering different learning tasks. Its key property is that the dendritic branches receive an immediate feedback from the somatic output spike, while in the corresponding network architecture the feedback would require additional backpropagating connections to the input units. Assuming a reinforcement learning scenario we formally derive a learning rule for the synaptic contacts on the individual dendritic trees which depends on the presynaptic activity, the local NMDA spikes, the somatic action potential, and a delayed reinforcement signal. We test the model for two scenarios: the learning of binary classifications and of precise spike timings. We show that the immediate feedback represented by the backpropagating action potential supplies the individual dendritic branches with enough information to efficiently adapt their synapses and to speed up the learning process.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We study synaptic plasticity in a complex neuronal cell model where NMDA-spikes can arise in certain dendritic zones. In the context of reinforcement learning, two kinds of plasticity rules are derived, zone reinforcement (ZR) and cell reinforcement (CR), which both optimize the expected reward by stochastic gradient ascent. For ZR, the synaptic plasticity response to the external reward signal is modulated exclusively by quantities which are local to the NMDA-spike initiation zone in which the synapse is situated. CR, in addition, uses nonlocal feedback from the soma of the cell, provided by mechanisms such as the backpropagating action potential. Simulation results show that, compared to ZR, the use of nonlocal feedback in CR can drastically enhance learning performance. We suggest that the availability of nonlocal feedback for learning is a key advantage of complex neurons over networks of simple point neurons, which have previously been found to be largely equivalent with regard to computational capability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Artificial pancreas is in the forefront of research towards the automatic insulin infusion for patients with type 1 diabetes. Due to the high inter- and intra-variability of the diabetic population, the need for personalized approaches has been raised. This study presents an adaptive, patient-specific control strategy for glucose regulation based on reinforcement learning and more specifically on the Actor-Critic (AC) learning approach. The control algorithm provides daily updates of the basal rate and insulin-to-carbohydrate (IC) ratio in order to optimize glucose regulation. A method for the automatic and personalized initialization of the control algorithm is designed based on the estimation of the transfer entropy (TE) between insulin and glucose signals. The algorithm has been evaluated in silico in adults, adolescents and children for 10 days. Three scenarios of initialization to i) zero values, ii) random values and iii) TE-based values have been comparatively assessed. The results have shown that when the TE-based initialization is used, the algorithm achieves faster learning with 98%, 90% and 73% in the A+B zones of the Control Variability Grid Analysis for adults, adolescents and children respectively after five days compared to 95%, 78%, 41% for random initialization and 93%, 88%, 41% for zero initial values. Furthermore, in the case of children, the daily Low Blood Glucose Index reduces much faster when the TE-based tuning is applied. The results imply that automatic and personalized tuning based on TE reduces the learning period and improves the overall performance of the AC algorithm.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this thesis is model some processes from the nature as evolution and co-evolution, and proposing some techniques that can ensure that these learning process really happens and useful to solve some complex problems as Go game. The Go game is ancient and very complex game with simple rules which still is a challenge for the Artificial Intelligence. This dissertation cover some approaches that were applied to solve this problem, proposing solve this problem using competitive and cooperative co-evolutionary learning methods and other techniques proposed by the author. To study, implement and prove these methods were used some neural networks structures, a framework free available and coded many programs. The techniques proposed were coded by the author, performed many experiments to find the best configuration to ensure that co-evolution is progressing and discussed the results. Using co-evolutionary learning processes can be observed some pathologies which could impact co-evolution progress. In this dissertation is introduced some techniques to solve pathologies as loss of gradients, cycling dynamics and forgetting. According to some authors, one solution to solve these co-evolution pathologies is introduce more diversity in populations that are evolving. In this thesis is proposed some techniques to introduce more diversity and some diversity measurements for neural networks structures to monitor diversity during co-evolution. The genotype diversity evolved were analyzed in terms of its impact to global fitness of the strategies evolved and their generalization. Additionally, it was introduced a memory mechanism in the network neural structures to reinforce some strategies in the genes of the neurons evolved with the intention that some good strategies learned are not forgotten. In this dissertation is presented some works from other authors in which cooperative and competitive co-evolution has been applied. The Go board size used in this thesis was 9x9, but can be easily escalated to more bigger boards.The author believe that programs coded and techniques introduced in this dissertation can be used for other domains.