882 resultados para reinforcement learning,cryptography,machine learning,deep learning,Deep Q-Learning (DQN),AES


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We study synaptic plasticity in a complex neuronal cell model where NMDA-spikes can arise in certain dendritic zones. In the context of reinforcement learning, two kinds of plasticity rules are derived, zone reinforcement (ZR) and cell reinforcement (CR), which both optimize the expected reward by stochastic gradient ascent. For ZR, the synaptic plasticity response to the external reward signal is modulated exclusively by quantities which are local to the NMDA-spike initiation zone in which the synapse is situated. CR, in addition, uses nonlocal feedback from the soma of the cell, provided by mechanisms such as the backpropagating action potential. Simulation results show that, compared to ZR, the use of nonlocal feedback in CR can drastically enhance learning performance. We suggest that the availability of nonlocal feedback for learning is a key advantage of complex neurons over networks of simple point neurons, which have previously been found to be largely equivalent with regard to computational capability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Artificial pancreas is in the forefront of research towards the automatic insulin infusion for patients with type 1 diabetes. Due to the high inter- and intra-variability of the diabetic population, the need for personalized approaches has been raised. This study presents an adaptive, patient-specific control strategy for glucose regulation based on reinforcement learning and more specifically on the Actor-Critic (AC) learning approach. The control algorithm provides daily updates of the basal rate and insulin-to-carbohydrate (IC) ratio in order to optimize glucose regulation. A method for the automatic and personalized initialization of the control algorithm is designed based on the estimation of the transfer entropy (TE) between insulin and glucose signals. The algorithm has been evaluated in silico in adults, adolescents and children for 10 days. Three scenarios of initialization to i) zero values, ii) random values and iii) TE-based values have been comparatively assessed. The results have shown that when the TE-based initialization is used, the algorithm achieves faster learning with 98%, 90% and 73% in the A+B zones of the Control Variability Grid Analysis for adults, adolescents and children respectively after five days compared to 95%, 78%, 41% for random initialization and 93%, 88%, 41% for zero initial values. Furthermore, in the case of children, the daily Low Blood Glucose Index reduces much faster when the TE-based tuning is applied. The results imply that automatic and personalized tuning based on TE reduces the learning period and improves the overall performance of the AC algorithm.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this thesis is model some processes from the nature as evolution and co-evolution, and proposing some techniques that can ensure that these learning process really happens and useful to solve some complex problems as Go game. The Go game is ancient and very complex game with simple rules which still is a challenge for the Artificial Intelligence. This dissertation cover some approaches that were applied to solve this problem, proposing solve this problem using competitive and cooperative co-evolutionary learning methods and other techniques proposed by the author. To study, implement and prove these methods were used some neural networks structures, a framework free available and coded many programs. The techniques proposed were coded by the author, performed many experiments to find the best configuration to ensure that co-evolution is progressing and discussed the results. Using co-evolutionary learning processes can be observed some pathologies which could impact co-evolution progress. In this dissertation is introduced some techniques to solve pathologies as loss of gradients, cycling dynamics and forgetting. According to some authors, one solution to solve these co-evolution pathologies is introduce more diversity in populations that are evolving. In this thesis is proposed some techniques to introduce more diversity and some diversity measurements for neural networks structures to monitor diversity during co-evolution. The genotype diversity evolved were analyzed in terms of its impact to global fitness of the strategies evolved and their generalization. Additionally, it was introduced a memory mechanism in the network neural structures to reinforce some strategies in the genes of the neurons evolved with the intention that some good strategies learned are not forgotten. In this dissertation is presented some works from other authors in which cooperative and competitive co-evolution has been applied. The Go board size used in this thesis was 9x9, but can be easily escalated to more bigger boards.The author believe that programs coded and techniques introduced in this dissertation can be used for other domains.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce a diffusion-based algorithm in which multiple agents cooperate to predict a common and global statevalue function by sharing local estimates and local gradient information among neighbors. Our algorithm is a fully distributed implementation of the gradient temporal difference with linear function approximation, to make it applicable to multiagent settings. Simulations illustrate the benefit of cooperation in learning, as made possible by the proposed algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The nucleus accumbens, a site within the ventral striatum, is best known for its prominent role in mediating the reinforcing effects of drugs of abuse such as cocaine, alcohol, and nicotine. Indeed, it is generally believed that this structure subserves motivated behaviors, such as feeding, drinking, sexual behavior, and exploratory locomotion, which are elicited by natural rewards or incentive stimuli. A basic rule of positive reinforcement is that motor responses will increase in magnitude and vigor if followed by a rewarding event. It is likely, therefore, that the nucleus accumbens may serve as a substrate for reinforcement learning. However, there is surprisingly little information concerning the neural mechanisms by which appetitive responses are learned. In the present study, we report that treatment of the nucleus accumbens core with the selective competitive N-methyl-d-aspartate (NMDA) antagonist 2-amino-5-phosphonopentanoic acid (AP-5; 5 nmol/0.5 μl bilaterally) impairs response-reinforcement learning in the acquisition of a simple lever-press task to obtain food. Once the rats learned the task, AP-5 had no effect, demonstrating the requirement of NMDA receptor-dependent plasticity in the early stages of learning. Infusion of AP-5 into the accumbens shell produced a much smaller impairment of learning. Additional experiments showed that AP-5 core-treated rats had normal feeding and locomotor responses and were capable of acquiring stimulus-reward associations. We hypothesize that stimulation of NMDA receptors within the accumbens core is a key process through which motor responses become established in response to reinforcing stimuli. Further, this mechanism, may also play a critical role in the motivational and addictive properties of drugs of abuse.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To solve multi-objective problems, multiple reward signals are often scalarized into a single value and further processed using established single-objective problem solving techniques. While the field of multi-objective optimization has made many advances in applying scalarization techniques to obtain good solution trade-offs, the utility of applying these techniques in the multi-objective multi-agent learning domain has not yet been thoroughly investigated. Agents learn the value of their decisions by linearly scalarizing their reward signals at the local level, while acceptable system wide behaviour results. However, the non-linear relationship between weighting parameters of the scalarization function and the learned policy makes the discovery of system wide trade-offs time consuming. Our first contribution is a thorough analysis of well known scalarization schemes within the multi-objective multi-agent reinforcement learning setup. The analysed approaches intelligently explore the weight-space in order to find a wider range of system trade-offs. In our second contribution, we propose a novel adaptive weight algorithm which interacts with the underlying local multi-objective solvers and allows for a better coverage of the Pareto front. Our third contribution is the experimental validation of our approach by learning bi-objective policies in self-organising smart camera networks. We note that our algorithm (i) explores the objective space faster on many problem instances, (ii) obtained solutions that exhibit a larger hypervolume, while (iii) acquiring a greater spread in the objective space.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

* This research was partially supported by the Latvian Science Foundation under grant No.02-86d.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditional heuristic approaches to the Examination Timetabling Problem normally utilize a stochastic method during Optimization for the selection of the next examination to be considered for timetabling within the neighbourhood search process. This paper presents a technique whereby the stochastic method has been augmented with information from a weighted list gathered during the initial adaptive construction phase, with the purpose of intelligently directing examination selection. In addition, a Reinforcement Learning technique has been adapted to identify the most effective portions of the weighted list in terms of facilitating the greatest potential for overall solution improvement. The technique is tested against the 2007 International Timetabling Competition datasets with solutions generated within a time frame specified by the competition organizers. The results generated are better than those of the competition winner in seven of the twelve examinations, while being competitive for the remaining five examinations. This paper also shows experimentally how using reinforcement learning has improved upon our previous technique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

That humans and animals learn from interaction with the environment is a foundational idea underlying nearly all theories of learning and intelligence. Learning that certain outcomes are associated with specific actions or stimuli (both internal and external), is at the very core of the capacity to adapt behaviour to environmental changes. In the present work, appetitive and aversive reinforcement learning paradigms have been used to investigate the fronto-striatal loops and behavioural correlates of adaptive and maladaptive reinforcement learning processes, aiming to a deeper understanding of how cortical and subcortical substrates interacts between them and with other brain systems to support learning. By combining a large variety of neuroscientific approaches, including behavioral and psychophysiological methods, EEG and neuroimaging techniques, these studies aim at clarifying and advancing the knowledge of the neural bases and computational mechanisms of reinforcement learning, both in normal and neurologically impaired population.