46 resultados para Game rule encoding

em BORIS: Bern Open Repository and Information System - Berna - Suiça


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior, and in particular in behavioral decision making. Such decision making is likely to involve the integration of many synaptic events in space and time. However, using a single reinforcement signal to modulate synaptic plasticity, as suggested in classical reinforcement learning algorithms, a twofold problem arises. Different synapses will have contributed differently to the behavioral decision, and even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike-time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward, but also by a population feedback signal. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference (TD) based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task, the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second task involves an action sequence which is itself extended in time and reward is only delivered at the last action, as it is the case in any type of board-game. The third task is the inspection game that has been studied in neuroeconomics, where an inspector tries to prevent a worker from shirking. Applying our algorithm to this game yields a learning behavior which is consistent with behavioral data from humans and monkeys, revealing themselves properties of a mixed Nash equilibrium. The examples show that our neuronal implementation of reward based learning copes with delayed and stochastic reward delivery, and also with the learning of mixed strategies in two-opponent games.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior. But behavioral decision making is likely to involve the integration of many synaptic events in space and time. So in using a single reinforcement signal to modulate synaptic plasticity a twofold problem arises. Different synapses will have contributed differently to the behavioral decision and, even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward but by a population feedback signal as well. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second one involves an action sequence which is itself extended in time and reward is only delivered at the last action, as is the case in any type of board-game. The third is the inspection game that has been studied in neuroeconomics. It only has a mixed Nash equilibrium and exemplifies that the model also copes with stochastic reward delivery and the learning of mixed strategies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The prognosis of patients in whom pulmonary embolism (PE) is suspected but ruled out is poorly understood. We evaluated whether the initial assessment of clinical probability of PE could help to predict the prognosis for these patients.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Pulmonary Embolism Rule-out Criteria (PERC) rule is a clinical diagnostic rule designed to exclude pulmonary embolism (PE) without further testing. We sought to externally validate the diagnostic performance of the PERC rule alone and combined with clinical probability assessment based on the revised Geneva score.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Synaptic strength depresses for low and potentiates for high activation of the postsynaptic neuron. This feature is a key property of the Bienenstock–Cooper–Munro (BCM) synaptic learning rule, which has been shown to maximize the selectivity of the postsynaptic neuron, and thereby offers a possible explanation for experience-dependent cortical plasticity such as orientation selectivity. However, the BCM framework is rate-based and a significant amount of recent work has shown that synaptic plasticity also depends on the precise timing of presynaptic and postsynaptic spikes. Here we consider a triplet model of spike-timing–dependent plasticity (STDP) that depends on the interactions of three precisely timed spikes. Triplet STDP has been shown to describe plasticity experiments that the classical STDP rule, based on pairs of spikes, has failed to capture. In the case of rate-based patterns, we show a tight correspondence between the triplet STDP rule and the BCM rule. We analytically demonstrate the selectivity property of the triplet STDP rule for orthogonal inputs and perform numerical simulations for nonorthogonal inputs. Moreover, in contrast to BCM, we show that triplet STDP can also induce selectivity for input patterns consisting of higher-order spatiotemporal correlations, which exist in natural stimuli and have been measured in the brain. We show that this sensitivity to higher-order correlations can be used to develop direction and speed selectivity.