25 resultados para temporal difference learning

em BORIS: Bern Open Repository and Information System - Berna - Suiça


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior, and in particular in behavioral decision making. Such decision making is likely to involve the integration of many synaptic events in space and time. However, using a single reinforcement signal to modulate synaptic plasticity, as suggested in classical reinforcement learning algorithms, a twofold problem arises. Different synapses will have contributed differently to the behavioral decision, and even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike-time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward, but also by a population feedback signal. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference (TD) based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task, the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second task involves an action sequence which is itself extended in time and reward is only delivered at the last action, as it is the case in any type of board-game. The third task is the inspection game that has been studied in neuroeconomics, where an inspector tries to prevent a worker from shirking. Applying our algorithm to this game yields a learning behavior which is consistent with behavioral data from humans and monkeys, revealing themselves properties of a mixed Nash equilibrium. The examples show that our neuronal implementation of reward based learning copes with delayed and stochastic reward delivery, and also with the learning of mixed strategies in two-opponent games.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior. But behavioral decision making is likely to involve the integration of many synaptic events in space and time. So in using a single reinforcement signal to modulate synaptic plasticity a twofold problem arises. Different synapses will have contributed differently to the behavioral decision and, even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward but by a population feedback signal as well. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second one involves an action sequence which is itself extended in time and reward is only delivered at the last action, as is the case in any type of board-game. The third is the inspection game that has been studied in neuroeconomics. It only has a mixed Nash equilibrium and exemplifies that the model also copes with stochastic reward delivery and the learning of mixed strategies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a model for plasticity induction in reinforcement learning which is based on a cascade of synaptic memory traces. In the cascade of these so called eligibility traces presynaptic input is first corre lated with postsynaptic events, next with the behavioral decisions and finally with the external reinforcement. A population of leaky integrate and fire neurons endowed with this plasticity scheme is studied by simulation on different tasks. For operant co nditioning with delayed reinforcement, learning succeeds even when the delay is so large that the delivered reward reflects the appropriateness, not of the immediately preceeding response, but of a decision made earlier on in the stimulus - decision sequence . So the proposed model does not rely on the temporal contiguity between decision and pertinent reward and thus provides a viable means of addressing the temporal credit assignment problem. In the same task, learning speeds up with increasing population si ze, showing that the plasticity cascade simultaneously addresses the spatial problem of assigning credit to the different population neurons. Simulations on other task such as sequential decision making serve to highlight the robustness of the proposed sch eme and, further, contrast its performance to that of temporal difference based approaches to reinforcement learning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

n learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Comments on an article by Kashima et al. (see record 2007-10111-001). In their target article Kashima and colleagues try to show how a connectionist model conceptualization of the self is best suited to capture the self's temporal and socio-culturally contextualized nature. They propose a new model and to support this model, the authors conduct computer simulations of psychological phenomena whose importance for the self has long been clear, even if not formally modeled, such as imitation, and learning of sequence and narrative. As explicated when we advocated connectionist models as a metaphor for self in Mischel and Morf (2003), we fully endorse the utility of such a metaphor, as these models have some of the processing characteristics necessary for capturing key aspects and functions of a dynamic cognitive-affective self-system. As elaborated in that chapter, we see as their principal strength that connectionist models can take account of multiple simultaneous processes without invoking a single central control. All outputs reflect a distributed pattern of activation across a large number of simple processing units, the nature of which depends on (and changes with) the connection weights between the links and the satisfaction of mutual constraints across these links (Rummelhart & McClelland, 1986). This allows a simple account for why certain input features will at times predominate, while others take over on other occasions. (PsycINFO Database Record (c) 2008 APA, all rights reserved)

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The causes of a greening trend detected in the Arctic using the normalized difference vegetation index (NDVI) are still poorly understood. Changes in NDVI are a result of multiple ecological and social factors that affect tundra net primary productivity. Here we use a 25 year time series of AVHRR-derived NDVI data (AVHRR: advanced very high resolution radiometer), climate analysis, a global geographic information database and ground-based studies to examine the spatial and temporal patterns of vegetation greenness on the Yamal Peninsula, Russia. We assess the effects of climate change, gas-field development, reindeer grazing and permafrost degradation. In contrast to the case for Arctic North America, there has not been a significant trend in summer temperature or NDVI, and much of the pattern of NDVI in this region is due to disturbances. There has been a 37% change in early-summer coastal sea-ice concentration, a 4% increase in summer land temperatures and a 7% change in the average time-integrated NDVI over the length of the satellite observations. Gas-field infrastructure is not currently extensive enough to affect regional NDVI patterns. The effect of reindeer is difficult to quantitatively assess because of the lack of control areas where reindeer are excluded. Many of the greenest landscapes on the Yamal are associated with landslides and drainage networks that have resulted from ongoing rapid permafrost degradation. A warming climate and enhanced winter snow are likely to exacerbate positive feedbacks between climate and permafrost thawing. We present a diagram that summarizes the social and ecological factors that influence Arctic NDVI. The NDVI should be viewed as a powerful monitoring tool that integrates the cumulative effect of a multitude of factors affecting Arctic land-cover change.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

OBJECTIVES The objectives of the present study were to investigate temporal/spectral sound-feature processing in preschool children (4 to 7 years old) with peripheral hearing loss compared with age-matched controls. The results verified the presence of statistical learning, which was diminished in children with hearing impairments (HIs), and elucidated possible perceptual mediators of speech production. DESIGN Perception and production of the syllables /ba/, /da/, /ta/, and /na/ were recorded in 13 children with normal hearing and 13 children with HI. Perception was assessed physiologically through event-related potentials (ERPs) recorded by EEG in a multifeature mismatch negativity paradigm and behaviorally through a discrimination task. Temporal and spectral features of the ERPs during speech perception were analyzed, and speech production was quantitatively evaluated using speech motor maximum performance tasks. RESULTS Proximal to stimulus onset, children with HI displayed a difference in map topography, indicating diminished statistical learning. In later ERP components, children with HI exhibited reduced amplitudes in the N2 and early parts of the late disciminative negativity components specifically, which are associated with temporal and spectral control mechanisms. Abnormalities of speech perception were only subtly reflected in speech production, as the lone difference found in speech production studies was a mild delay in regulating speech intensity. CONCLUSIONS In addition to previously reported deficits of sound-feature discriminations, the present study results reflect diminished statistical learning in children with HI, which plays an early and important, but so far neglected, role in phonological processing. Furthermore, the lack of corresponding behavioral abnormalities in speech production implies that impaired perceptual capacities do not necessarily translate into productive deficits.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigated the contribution of postictal memory testing for lateralizing the epileptic focus and predicting memory outcome after surgery for temporal lobe epilepsy (TLE). Forty-five patients with TLE underwent interictal, postictal, and postoperative assessment of verbal and nonverbal memory. Surgery consisted of anterior temporal lobectomy (36), selective isolated amygdalohippocampectomy (6), or amygdalohippocampectomy coupled to lesionectomy (3). Postictal and postoperative but not interictal memory were significantly lower in left TLE than in right TLE. Nonverbal memory showed no significant difference in left TLE versus right TLE in all conditions. Postictal memory was significantly correlated with postoperative memory, but the effect disappeared when the lateralization of the focus was considered. Postictal verbal memory is a useful bedside tool that can help lateralize the epileptic focus. Larger studies are needed to further estimate its predictive value of the postoperative outcome.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sustainable natural resource use requires that multiple actors reassess their situation in a systemic perspective. This can be conceptualised as a social learning process between actors from rural communities and the experts from outside organisations. A specifically designed workshop oriented towards a systemic view of natural resource use and the enhancement of mutual learning between local and external actors, provided the background for evaluating the potentials and constraints of intensified social learning processes. Case studies in rural communities in India, Bolivia, Peru and Mali showed that changes in the narratives of the participants of the workshop followed a similar temporal sequence relatively independently from their specific contexts. Social learning processes were found to be more likely to be successful if they 1) opened new space for communicative action, allowing for an intersubjective re-definition of the present situation, 2) contributed to rebalance the relationships between social capital and social, emotional and cognitive competencies within and between local and external actors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article provides a selective overview of the functional neuroimaging literature with an emphasis on emotional activation processes. Emotions are fast and flexible response systems that provide basic tendencies for adaptive action. From the range of involved component functions, we first discuss selected automatic mechanisms that control basic adaptational changes. Second, we illustrate how neuroimaging work has contributed to the mapping of the network components associated with basic emotion families (fear, anger, disgust, happiness), and secondary dimensional concepts that organise the meaning space for subjective experience and verbal labels (emotional valence, activity/intensity, approach/withdrawal, etc.). Third, results and methodological difficulties are discussed in view of own neuroimaging experiments that investigated the component functions involved in emotional learning. The amygdala, prefrontal cortex, and striatum form a network of reciprocal connections that show topographically distinct patterns of activity as a correlate of up and down regulation processes during an emotional episode. Emotional modulations of other brain systems have attracted recent research interests. Emotional neuroimaging calls for more representative designs that highlight the modulatory influences of regulation strategies and socio-cultural factors responsible for inhibitory control and extinction. We conclude by emphasising the relevance of the temporal process dynamics of emotional activations that may provide improved prediction of individual differences in emotionality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The encoding of verbal stimuli elicits left-lateralized activation patterns within the medial temporal lobes in healthy adults. In our study, patients with left- and right-sided temporal lobe epilepsy (LTLE, RTLE) were investigated during the encoding and retrieval of word-pair associates using functional magnetic resonance imaging. Functional asymmetry of activation patterns in hippocampal, inferior frontal, and temporolateral neocortical areas associated with language functions was analyzed. Hippocampal activation patterns in patients with LTLE were more right-lateralized than those in patients with RTLE (P<0.05). There were no group differences with respect to lateralization in frontal or temporolateral regions of interest (ROIs). For both groups, frontal cortical activation patterns were significantly more left-lateralized than hippocampal patterns (P<0.05). For patients with LTLE, there was a strong trend toward a difference in functional asymmetry between the temporolateral and hippocampal ROIs (P=0.059). A graded effect of epileptic activity on laterality of the different regional activation patterns is discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Using functional magnetic resonance imaging during a verbal memory task, we investigated correlations of signal fluctuations within the hippocampus and ipsilateral frontal as well as temporal areas in temporal lobe epilepsy patients. Declarative memory abilities were additionally examined before and after temporal lobe epilepsy surgery. A significant difference exists in functional connectivity between patients whose mnemonic functions deteriorated and those who remained stable or improved. Univariate analyses showed significantly higher preoperative coupling between the hippocampus and Brodmann area 22 for the group that decreased in verbal learning. We suggest greater coupling to reflect higher functional network integrity. Postoperatively reduced learning ability in patients with higher preoperative coupling underlines the importance of hippocampal interaction with cortical areas for successful memory formation.