147 resultados para Reinforcement-Learning
Resumo:
Many species are able to learn to associate behaviours with rewards as this gives fitness advantages in changing environments. Social interactions between population members may, however, require more cognitive abilities than simple trial-and-error learning, in particular the capacity to make accurate hypotheses about the material payoff consequences of alternative action combinations. It is unclear in this context whether natural selection necessarily favours individuals to use information about payoffs associated with nontried actions (hypothetical payoffs), as opposed to simple reinforcement of realized payoff. Here, we develop an evolutionary model in which individuals are genetically determined to use either trial-and-error learning or learning based on hypothetical reinforcements, and ask what is the evolutionarily stable learning rule under pairwise symmetric two-action stochastic repeated games played over the individual's lifetime. We analyse through stochastic approximation theory and simulations the learning dynamics on the behavioural timescale, and derive conditions where trial-and-error learning outcompetes hypothetical reinforcement learning on the evolutionary timescale. This occurs in particular under repeated cooperative interactions with the same partner. By contrast, we find that hypothetical reinforcement learners tend to be favoured under random interactions, but stable polymorphisms can also obtain where trial-and-error learners are maintained at a low frequency. We conclude that specific game structures can select for trial-and-error learning even in the absence of costs of cognition, which illustrates that cost-free increased cognition can be counterselected under social interactions.
Resumo:
When individuals learn by trial-and-error, they perform randomly chosen actions and then reinforce those actions that led to a high payoff. However, individuals do not always have to physically perform an action in order to evaluate its consequences. Rather, they may be able to mentally simulate actions and their consequences without actually performing them. Such fictitious learners can select actions with high payoffs without making long chains of trial-and-error learning. Here, we analyze the evolution of an n-dimensional cultural trait (or artifact) by learning, in a payoff landscape with a single optimum. We derive the stochastic learning dynamics of the distance to the optimum in trait space when choice between alternative artifacts follows the standard logit choice rule. We show that for both trial-and-error and fictitious learners, the learning dynamics stabilize at an approximate distance of root n/(2 lambda(e)) away from the optimum, where lambda(e) is an effective learning performance parameter depending on the learning rule under scrutiny. Individual learners are thus unlikely to reach the optimum when traits are complex (n large), and so face a barrier to further improvement of the artifact. We show, however, that this barrier can be significantly reduced in a large population of learners performing payoff-biased social learning, in which case lambda(e) becomes proportional to population size. Overall, our results illustrate the effects of errors in learning, levels of cognition, and population size for the evolution of complex cultural traits. (C) 2013 Elsevier Inc. All rights reserved.
Resumo:
In order to understand the development of non-genetically encoded actions during an animal's lifespan, it is necessary to analyze the dynamics and evolution of learning rules producing behavior. Owing to the intrinsic stochastic and frequency-dependent nature of learning dynamics, these rules are often studied in evolutionary biology via agent-based computer simulations. In this paper, we show that stochastic approximation theory can help to qualitatively understand learning dynamics and formulate analytical models for the evolution of learning rules. We consider a population of individuals repeatedly interacting during their lifespan, and where the stage game faced by the individuals fluctuates according to an environmental stochastic process. Individuals adjust their behavioral actions according to learning rules belonging to the class of experience-weighted attraction learning mechanisms, which includes standard reinforcement and Bayesian learning as special cases. We use stochastic approximation theory in order to derive differential equations governing action play probabilities, which turn out to have qualitative features of mutator-selection equations. We then perform agent-based simulations to find the conditions where the deterministic approximation is closest to the original stochastic learning process for standard 2-action 2-player fluctuating games, where interaction between learning rules and preference reversal may occur. Finally, we analyze a simplified model for the evolution of learning in a producer-scrounger game, which shows that the exploration rate can interact in a non-intuitive way with other features of co-evolving learning rules. Overall, our analyses illustrate the usefulness of applying stochastic approximation theory in the study of animal learning.
Resumo:
Learning what to approach, and what to avoid, involves assigning value to environmental cues that predict positive and negative events. Studies in animals indicate that the lateral habenula encodes the previously learned negative motivational value of stimuli. However, involvement of the habenula in dynamic trial-by-trial aversive learning has not been assessed, and the functional role of this structure in humans remains poorly characterized, in part, due to its small size. Using high-resolution functional neuroimaging and computational modeling of reinforcement learning, we demonstrate positive habenula responses to the dynamically changing values of cues signaling painful electric shocks, which predict behavioral suppression of responses to those cues across individuals. By contrast, negative habenula responses to monetary reward cue values predict behavioral invigoration. Our findings show that the habenula plays a key role in an online aversive learning system and in generating associated motivated behavior in humans.
Resumo:
Ecologically and evolutionarily oriented research on learning has traditionally been carried out on vertebrates and bees. While less sophisticated than those animals, fruit flies (Drosophila) are capable of several forms of learning, and have an advantage of a short generation time, which makes them an ideal system for experimental evolution studies. This review summarizes the insights into evolutionary questions about learning gained in the last decade from evolutionary experiments on Drosophila. These experiments demonstrate that Drosophila have the genetic potential to evolve substantially improved learning performance in ecologically relevant learning tasks. In at least one set of selected populations the improved learning generalized to another task than that used to impose selection, involving a different behavior, different stimuli, and a different sensory channel for the aversive reinforcement. This improvement in learning ability was associated with reduction in other fitness-related traits, such as larval competitive ability and lifespan, pointing out to evolutionary trade-offs of improved learning. These trade-offs were confirmed by other evolutionary experiments where reduction in learning performance was observed as a correlated response to selection for tolerance to larval nutritional stress or for delayed aging. Such trade-offs could be one reason why fruit flies have not fully used up their evolutionary potential for learning ability. Finally, another evolutionary experiment with Drosophila provided the first direct evidence for the long-standing ideas that learning can under some circumstances accelerate and in other slow down genetically-based evolutionary change. These results demonstrate the usefulness of fruit flies as a model system to address evolutionary questions about learning.
Resumo:
Background: One characteristic of post traumatic stress disorder is an inability to adapt to a safe environment i.e. to change behavior when predictions of adverse outcomes are not met. Recent studies have also indicated that PTSD patients have altered pain processing, with hyperactivation of the putamen and insula to aversive stimuli (Geuze et al, 2007). The present study examined neuronal responses to aversive and predicted aversive events. Methods: Twenty-four trauma exposed non-PTSD controls and nineteen subjects with PTSD underwent fMRI imaging during a partial reinforcement fear conditioning paradigm, with a mild electric shock as the unconditioned stimuli (UCS). Three conditions were analyzed: actual presentations of the UCS, events when a UCS was expected, but omitted (CS+), and events when the UCS was neither expected nor delivered (CS-). Results: The UCS evoked significant alterations in the pain matrix consisting of the brainstem, the midbrain, the thalamus, the insula, the anterior and middle cingulate and the contralateral somatosensory cortex. PTSD subjects displayed bilaterally elevated putamen activity to the electric shock, as compared to controls. In trials when USC was expected, but omitted, significant activations were observed in the brainstem, the midbrain, the anterior insula and the anterior cingulate. PTSD subjects displayed similar activations, but also elevated activations in the amygdala and the posterior insula. Conclusions: These results indicate altered fear and safety learning in PTSD, and neuronal activations are further explored in terms of functional connectivity using psychophysiological interaction analyses.
Resumo:
The influence of proximal olfactory cues on place learning and memory was tested in two different spatial tasks. Rats were trained to find a hole leading to their home cage or a single food source in an array of petri dishes. The two apparatuses differed both by the type of reinforcement (return to the home cage or food reward) and the local characteristics of the goal (masked holes or salient dishes). In both cases, the goal was in a fixed location relative to distant visual landmarks and could be marked by a local olfactory cue. Thus, the position of the goal was defined by two sets of redundant cues, each of which was sufficient to allow the discrimination of the goal location. These experiments were conducted with two strains of hooded rats (Long-Evans and PVG), which show different speeds of acquisition in place learning tasks. They revealed that the presence of an olfactory cue marking the goal facilitated learning of its location and that the facilitation persisted after the removal of the cue. Thus, the proximal olfactory cue appeared to potentiate learning and memory of the goal location relative to distant environmental cues. This facilitating effect was only detected when the expression of spatial memory was not already optimal, i.e., during the early phase of acquisition. It was not limited to a particular strain.
Resumo:
Orienting attention in space recruits fronto-parietal networks whose damage results in unilateral spatial neglect. However, attention orienting may also be governed by emotional and motivational factors; but it remains unknown whether these factors act through a modulation of the fronto-parietal attentional systems or distinct neural pathways. Here we asked whether attentional orienting is affected by learning about the reward value of targets in a visual search task, in a spatially specific manner, and whether these effects are preserved in right-brain damaged patients with left spatial neglect. We found that associating rewards with left-sided (but not right-sided) targets during search led to progressive exploration biases towards left space, in both healthy people and neglect patients. Such spatially specific biases occurred even without any conscious awareness of the asymmetric reward contingencies. These results show that reward-induced modulations of space representation are preserved despite a dysfunction of fronto-parietal networks associated with neglect, and therefore suggest that they may arise through spared subcortical networks directly acting on sensory processing and/or oculomotor circuits. These effects could be usefully exploited for potentiating rehabilitation strategies in neglect patients.
Resumo:
We conducted an experiment to assess the use of olfactory traces for spatial orientation in an open environment in rats, Rattus norvegicus. We trained rats to locate a food source at a fixed location from different starting points, in the presence or absence of visual information. A single food source was hidden in an array of 19 petri dishes regularly arranged in an open-field arena. Rats were trained to locate the food source either in white light (with full access to distant visuospatial information) or in darkness (without any visual information). In both cases, the goal was in a fixed location relative to the spatial frame of reference. The results of this experiment revealed that the presence of noncontrolled olfactory traces coherent with the spatial frame of reference enables rats to locate a unique position as accurately in darkness as with full access to visuospatial information. We hypothesize that the olfactory traces complement the use of other orientation mechanisms, such as path integration or the reliance on visuospatial information. This experiment demonstrates that rats can rely on olfactory traces for accurate orientation, and raises questions about the establishment of such traces in the absence of any other orientation mechanism. Copyright 1998 The Association for the Study of Animal Behaviour.
Resumo:
L'objectif principal de ce travail était d'explorer les relations parent-enfant et les processus d'apprentissage familiaux associés aux troubles anxieux. A cet effet, des familles ayant un membre anxieux (la mère ou l'enfant) ont été comparées avec des familles n'ayant aucun membre anxieux. Dans une première étude, l'observation de l'interaction mère-enfant, pendant une situation standardisée de jeu, a révélé que les mères présentant un trouble panique étaient plus susceptibles de se montrer verbalement contrôlantes, critiques et moins sensibles aux besoins de l'enfant, que les mères qui ne présentaient pas de trouble panique. Une deuxième étude a examiné les perceptions des différents membres de la famille quant aux relations au sein de la famille et a indiqué que, par comparaison aux adolescents non-anxieux, les adolescents anxieux étaient plus enclins à éprouver un sentiment d'autonomie individuelle diminué par rapport à leurs parents. Finalement, une troisième étude s'est intéressée à déterminer l'impact d'expériences d'apprentissage moins directes dans l'étiologie de l'anxiété. Les résultats ont indiqué que les mères présentant un trouble panique étaient plus enclines à s'engager dans des comportements qui maintiennent la panique et à impliquer leurs enfants dans ces comportements, que les mères ne présentant pas de trouble panique. En se basant sur des recherches antérieures qui ont établi une relation entre le contrôle parental, la perception de contrôle chez l'enfant et les troubles anxieux, le présent travail non seulement confirme ce lien mais propose également un modèle pour résumer l'état actuel des connaissances concernant les processus familiaux et le développement des troubles anxieux. Deux routes ont été suggérées par lesquelles l'anxiété pourrait être transmise de manière intergénérationnelle. Chacune de ces routes attribue un rôle important à la perception de contrôle chez l'enfant. L'idée est que lorsque les enfants présentent une prédisposition à interpréter le comportement de leurs parents comme hors de leur contrôle, ils seraient plus enclins à développer de l'anxiété. A ce titre, la perception du contrôle représenterait un tampon entre le comportement de contrôle/surprotection des parents et le trouble anxieux chez l'enfant. - The principal objective of the present work was to explore parent-child relationships and family learning processes associated with anxiety disorders. To this purpose, families with and without an anxious family member (mother or child) were compared. In a first study, observation of mother-child interaction, during a standard play situation, revealed that mothers with panic disorder were more likely to display verbal control and criticism, and less likely to display sensitivity toward their children than mothers without panic disorder. A second study examined family members' perceptions of family relationships and indicated that compared to non-anxious adolescents, anxious adolescents were more prone to experience a diminished sense of individual autonomy in relation to their parents. Finally a third study was interested in determining the effect of less direct learning experiences in the aetiology of anxiety. Results indicated that mothers with panic disorder were more likely to engage in panic-maintaining behaviour and to involve their children in this behaviour than mothers without panic disorder. Based on previous research showing a relationship between parental control, children's perception of control, and anxiety disorders, the present work not only further adds evidence to support this link but also proposes a model summarizing the current knowledge concerning family processes and the development of anxiety disorders. Two pathways have been suggested through which anxiety may be intergenerationally transmitted. Both pathways assign an important role to children's perception of control. The idea is that whenever children have a predisposition towards interpreting their parents' behaviour as beyond of their control, they may be more prone to develop anxiety. As such, perceived control may represent a buffer between parental overcontrolling/overprotective behaviours and childhood anxiety disorder.
Resumo:
Locating new wind farms is of crucial importance for energy policies of the next decade. To select the new location, an accurate picture of the wind fields is necessary. However, characterizing wind fields is a difficult task, since the phenomenon is highly nonlinear and related to complex topographical features. In this paper, we propose both a nonparametric model to estimate wind speed at different time instants and a procedure to discover underrepresented topographic conditions, where new measuring stations could be added. Compared to space filling techniques, this last approach privileges optimization of the output space, thus locating new potential measuring sites through the uncertainty of the model itself.
Resumo:
The aim of the present study was to assess the influence of local environmental olfactory cues on place learning in rats. We developed a new experimental design allowing the comparison of the use of local olfactory and visual cues in spatial and discrimination learning. We compared the effect of both types of cues on the discrimination of a single food source in an open-field arena. The goal was either in a fixed or in a variable location, and could be indicated by local olfactory and/or visual cues. The local cues enhanced the discrimination of the goal dish, whether it was in a fixed or in a variable location. However, we did not observe any overshadowing of the spatial information by the local olfactory or visual cue. Rats relied primarily on distant visuospatial information to locate the goal, neglecting local information when it was in conflict with the spatial information.