940 resultados para Reward (Theology)
Resumo:
Subiculum receives output of hippocampal CAI neurons and projects glutamatergic synapses onto nucleus accumbens (NAc), the subicular-NAc pathway linking memory and reward system. It is unknown whether morphine withdrawal influences synaptic plasticity in
Resumo:
Traumatic events always lead to aversive emotional memory, i.e., fear memory. In contrast, positive events in daily life such as sex experiences seem to reduce aversive memory after aversive events. Thus, we hypothesized that post-traumatic pleasurable ex
Resumo:
Perceptual learning improves perception through training. Perceptual learning improves with most stimulus types but fails when . certain stimulus types are mixed during training (roving). This result is surprising because classical supervised and unsupervised neural network models can cope easily with roving conditions. What makes humans so inferior compared to these models? As experimental and conceptual work has shown, human perceptual learning is neither supervised nor unsupervised but reward-based learning. Reward-based learning suffers from the so-called unsupervised bias, i.e., to prevent synaptic " drift" , the . average reward has to be exactly estimated. However, this is impossible when two or more stimulus types with different rewards are presented during training (and the reward is estimated by a running average). For this reason, we propose no learning occurs in roving conditions. However, roving hinders perceptual learning only for combinations of similar stimulus types but not for dissimilar ones. In this latter case, we propose that a critic can estimate the reward for each stimulus type separately. One implication of our analysis is that the critic cannot be located in the visual system. © 2011 Elsevier Ltd.
Resumo:
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters. © 2011 Elsevier Ltd. All rights reserved.
Resumo:
Current commercial dialogue systems typically use hand-crafted grammars for Spoken Language Understanding (SLU) operating on the top one or two hypotheses output by the speech recogniser. These systems are expensive to develop and they suffer from significant degradation in performance when faced with recognition errors. This paper presents a robust method for SLU based on features extracted from the full posterior distribution of recognition hypotheses encoded in the form of word confusion networks. Following [1], the system uses SVM classifiers operating on n-gram features, trained on unaligned input/output pairs. Performance is evaluated on both an off-line corpus and on-line in a live user trial. It is shown that a statistical discriminative approach to SLU operating on the full posterior ASR output distribution can substantially improve performance both in terms of accuracy and overall dialogue reward. Furthermore, additional gains can be obtained by incorporating features from the previous system output. © 2012 IEEE.
Resumo:
Humans are creatures of routine and habit. When faced with situations in which a default option is available, people show a consistent tendency to stick with the default. Why this occurs is unclear. To elucidate its neural basis, we used a novel gambling task in conjunction with functional magnetic resonance imaging. Behavioral results revealed that participants were more likely to choose the default card and felt enhanced emotional responses to outcomes after making the decision to switch. We show that increased tendency to switch away from the default during the decision phase was associated with decreased activity in the anterior insula; activation in this same area in reaction to "switching away from the default and losing" was positively related with experienced frustration. In contrast, decisions to choose the default engaged the ventral striatum, the same reward area as seen in winning. Our findings highlight aversive processes in the insula as underlying the default bias and suggest that choosing the default may be rewarding in itself.
Resumo:
Studies of human decision making emerge from two dominant traditions: learning theorists [1-3] study choices in which options are evaluated on the basis of experience, whereas behavioral economists and financial decision theorists study choices in which the key decision variables are explicitly stated. Growing behavioral evidence suggests that valuation based on these different classes of information involves separable mechanisms [4-8], but the relevant neuronal substrates are unknown. This is important for understanding the all-too-common situation in which choices must be made between alternatives that involve one or another kind of information. We studied behavior and brain activity while subjects made decisions between risky financial options, in which the associated utilities were either learned or explicitly described. We show a characteristic effect in subjects' behavior when comparing information acquired from experience with that acquired from description, suggesting that these kinds of information are treated differently. This behavioral effect was reflected neurally, and we show differential sensitivity to learned and described value and risk in brain regions commonly associated with reward processing. Our data indicate that, during decision making under risk, both behavior and the neural encoding of key decision variables are strongly influenced by the manner in which value information is presented.
Resumo:
The possibility that we will have to invest effort influences our future choice behavior. Indeed deciding whether an action is actually worth taking is a key element in the expression of human apathy or inertia. There is a well developed literature on brain activity related to the anticipation of effort, but how effort affects actual choice is less well understood. Furthermore, prior work is largely restricted to mental as opposed to physical effort or has confounded temporal with effortful costs. Here we investigated choice behavior and brain activity, using functional magnetic resonance imaging, in a study where healthy participants are required to make decisions between effortful gripping, where the factors of force (high and low) and reward (high and low) were varied, and a choice of merely holding a grip device for minimal monetary reward. Behaviorally, we show that force level influences the likelihood of choosing an effortful grip. We observed greater activity in the putamen when participants opt to grip an option with low effort compared with when they opt to grip an option with high effort. The results suggest that, over and above a nonspecific role in movement anticipation and salience, the putamen plays a crucial role in computations for choice that involves effort costs.
Resumo:
Reward processing is linked to specific neuromodulatory systems with a dopaminergic contribution to reward learning and motivational drive being well established. Neuromodulatory influences on hedonic responses to actual receipt of reward, or punishment, referred to as experienced utility are less well characterized, although a link to the endogenous opioid system is suggested. Here, in a combined functional magnetic resonance imaging-psychopharmacological investigation, we used naloxone to block central opioid function while subjects performed a gambling task associated with rewards and losses of different magnitudes, in which the mean expected value was always zero. A graded influence of naloxone on reward outcome was evident in an attenuation of pleasure ratings for larger reward outcomes, an effect mirrored in attenuation of brain activity to increasing reward magnitude in rostral anterior cingulate cortex. A more striking effect was seen for losses such that under naloxone all levels of negative outcome were rated as more unpleasant. This hedonic effect was associated with enhanced activity in anterior insula and caudal anterior cingulate cortex, areas implicated in aversive processing. Our data indicate that a central opioid system contributes to both reward and loss processing in humans and directly modulates the hedonic experience of outcomes.
Resumo:
People are alarmingly susceptible to manipulations that change both their expectations and experience of the value of goods. Recent studies in behavioral economics suggest such variability reflects more than mere caprice. People commonly judge options and prices in relative terms, rather than absolutely, and display strong sensitivity to exemplar and price anchors. We propose that these findings elucidate important principles about reward processing in the brain. In particular, relative valuation may be a natural consequence of adaptive coding of neuronal firing to optimise sensitivity across large ranges of value. Furthermore, the initial apparent arbitrariness of value may reflect the brains' attempts to optimally integrate diverse sources of value-relevant information in the face of perceived uncertainty. Recent findings in neuroscience support both accounts, and implicate regions in the orbitofrontal cortex, striatum, and ventromedial prefrontal cortex in the construction of value.
Resumo:
Studies on human monetary prediction and decision making emphasize the role of the striatum in encoding prediction errors for financial reward. However, less is known about how the brain encodes financial loss. Using Pavlovian conditioning of visual cues to outcomes that simultaneously incorporate the chance of financial reward and loss, we show that striatal activation reflects positively signed prediction errors for both. Furthermore, we show functional segregation within the striatum, with more anterior regions showing relative selectivity for rewards and more posterior regions for losses. These findings mirror the anteroposterior valence-specific gradient reported in rodents and endorse the role of the striatum in aversive motivational learning about financial losses, illustrating functional and anatomical consistencies with primary aversive outcomes such as pain.
Resumo:
The neural processes underlying empathy are a subject of intense interest within the social neurosciences. However, very little is known about how brain empathic responses are modulated by the affective link between individuals. We show here that empathic responses are modulated by learned preferences, a result consistent with economic models of social preferences. We engaged male and female volunteers in an economic game, in which two confederates played fairly or unfairly, and then measured brain activity with functional magnetic resonance imaging while these same volunteers observed the confederates receiving pain. Both sexes exhibited empathy-related activation in pain-related brain areas (fronto-insular and anterior cingulate cortices) towards fair players. However, these empathy-related responses were significantly reduced in males when observing an unfair person receiving pain. This effect was accompanied by increased activation in reward-related areas, correlated with an expressed desire for revenge. We conclude that in men (at least) empathic responses are shaped by valuation of other people's social behaviour, such that they empathize with fair opponents while favouring the physical punishment of unfair opponents, a finding that echoes recent evidence for altruistic punishment.
Resumo:
Termination of a painful or unpleasant event can be rewarding. However, whether the brain treats relief in a similar way as it treats natural reward is unclear, and the neural processes that underlie its representation as a motivational goal remain poorly understood. We used fMRI (functional magnetic resonance imaging) to investigate how humans learn to generate expectations of pain relief. Using a pavlovian conditioning procedure, we show that subjects experiencing prolonged experimentally induced pain can be conditioned to predict pain relief. This proceeds in a manner consistent with contemporary reward-learning theory (average reward/loss reinforcement learning), reflected by neural activity in the amygdala and midbrain. Furthermore, these reward-like learning signals are mirrored by opposite aversion-like signals in lateral orbitofrontal cortex and anterior cingulate cortex. This dual coding has parallels to 'opponent process' theories in psychology and promotes a formal account of prediction and expectation during pain.
Resumo:
The ability to use environmental stimuli to predict impending harm is critical for survival. Such predictions should be available as early as they are reliable. In pavlovian conditioning, chains of successively earlier predictors are studied in terms of higher-order relationships, and have inspired computational theories such as temporal difference learning. However, there is at present no adequate neurobiological account of how this learning occurs. Here, in a functional magnetic resonance imaging (fMRI) study of higher-order aversive conditioning, we describe a key computational strategy that humans use to learn predictions about pain. We show that neural activity in the ventral striatum and the anterior insula displays a marked correspondence to the signals for sequential learning predicted by temporal difference models. This result reveals a flexible aversive learning process ideally suited to the changing and uncertain nature of real-world environments. Taken with existing data on reward learning, our results suggest a critical role for the ventral striatum in integrating complex appetitive and aversive predictions to coordinate behaviour.
Resumo:
Statistical dialog systems (SDSs) are motivated by the need for a data-driven framework that reduces the cost of laboriously handcrafting complex dialog managers and that provides robustness against the errors created by speech recognizers operating in noisy environments. By including an explicit Bayesian model of uncertainty and by optimizing the policy via a reward-driven process, partially observable Markov decision processes (POMDPs) provide such a framework. However, exact model representation and optimization is computationally intractable. Hence, the practical application of POMDP-based systems requires efficient algorithms and carefully constructed approximations. This review article provides an overview of the current state of the art in the development of POMDP-based spoken dialog systems. © 1963-2012 IEEE.