20 resultados para Reinforcement-Learning


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Given the complex structure of the brain, how can synaptic plasticity explain the learning and forgetting of associations when these are continuously changing? We address this question by studying different reinforcement learning rules in a multilayer network in order to reproduce monkey behavior in a visuomotor association task. Our model can only reproduce the learning performance of the monkey if the synaptic modifications depend on the pre- and postsynaptic activity, and if the intrinsic level of stochasticity is low. This favored learning rule is based on reward modulated Hebbian synaptic plasticity and shows the interesting feature that the learning performance does not substantially degrade when adding layers to the network, even for a complex problem.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike latency codes. The multi-valued and continuous-valued features in the postsynaptic code allow for a generalization of binary decision making to multi-valued decision making and continuous-valued action selection. We show that code-specific learning rules speed up learning both for the discrete classification and the continuous regression tasks. The suggested learning rules also speed up with increasing population size as opposed to standard reinforcement learning rules. Continuous action selection is further shown to explain realistic learning speeds in the Morris water maze. Finally, we introduce the concept of action perturbation as opposed to the classical weight- or node-perturbation as an exploration mechanism underlying reinforcement learning. Exploration in the action space greatly increases the speed of learning as compared to exploration in the neuron or weight space.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Recent modeling of spike-timing-dependent plasticity indicates that plasticity involves as a third factor a local dendritic potential, besides pre- and postsynaptic firing times. We present a simple compartmental neuron model together with a non-Hebbian, biologically plausible learning rule for dendritic synapses where plasticity is modulated by these three factors. In functional terms, the rule seeks to minimize discrepancies between somatic firings and a local dendritic potential. Such prediction errors can arise in our model from stochastic fluctuations as well as from synaptic input, which directly targets the soma. Depending on the nature of this direct input, our plasticity rule subserves supervised or unsupervised learning. When a reward signal modulates the learning rate, reinforcement learning results. Hence a single plasticity rule supports diverse learning paradigms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A novel adaptive approach for glucose control in individuals with type 1 diabetes under sensor-augmented pump therapy is proposed. The controller, is based on Actor-Critic (AC) learning and is inspired by the principles of reinforcement learning and optimal control theory. The main characteristics of the proposed controller are (i) simultaneous adjustment of both the insulin basal rate and the bolus dose, (ii) initialization based on clinical procedures, and (iii) real-time personalization. The effectiveness of the proposed algorithm in terms of glycemic control has been investigated in silico in adults, adolescents and children under open-loop and closed-loop approaches, using announced meals with uncertainties in the order of ±25% in the estimation of carbohydrates. The results show that glucose regulation is efficient in all three groups of patients, even with uncertainties in the level of carbohydrates in the meal. The percentages in the A+B zones of the Control Variability Grid Analysis (CVGA) were 100% for adults, and 93% for both adolescents and children. The AC based controller seems to be a promising approach for the automatic adjustment of insulin infusion in order to improve glycemic control. After optimization of the algorithm, the controller will be tested in a clinical trial.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dynamic systems, especially in real-life applications, are often determined by inter-/intra-variability, uncertainties and time-varying components. Physiological systems are probably the most representative example in which population variability, vital signal measurement noise and uncertain dynamics render their explicit representation and optimization a rather difficult task. Systems characterized by such challenges often require the use of adaptive algorithmic solutions able to perform an iterative structural and/or parametrical update process towards optimized behavior. Adaptive optimization presents the advantages of (i) individualization through learning of basic system characteristics, (ii) ability to follow time-varying dynamics and (iii) low computational cost. In this chapter, the use of online adaptive algorithms is investigated in two basic research areas related to diabetes management: (i) real-time glucose regulation and (ii) real-time prediction of hypo-/hyperglycemia. The applicability of these methods is illustrated through the design and development of an adaptive glucose control algorithm based on reinforcement learning and optimal control and an adaptive, personalized early-warning system for the recognition and alarm generation against hypo- and hyperglycemic events.