29 resultados para Reward


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The possibility that we will have to invest effort influences our future choice behavior. Indeed deciding whether an action is actually worth taking is a key element in the expression of human apathy or inertia. There is a well developed literature on brain activity related to the anticipation of effort, but how effort affects actual choice is less well understood. Furthermore, prior work is largely restricted to mental as opposed to physical effort or has confounded temporal with effortful costs. Here we investigated choice behavior and brain activity, using functional magnetic resonance imaging, in a study where healthy participants are required to make decisions between effortful gripping, where the factors of force (high and low) and reward (high and low) were varied, and a choice of merely holding a grip device for minimal monetary reward. Behaviorally, we show that force level influences the likelihood of choosing an effortful grip. We observed greater activity in the putamen when participants opt to grip an option with low effort compared with when they opt to grip an option with high effort. The results suggest that, over and above a nonspecific role in movement anticipation and salience, the putamen plays a crucial role in computations for choice that involves effort costs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Reward processing is linked to specific neuromodulatory systems with a dopaminergic contribution to reward learning and motivational drive being well established. Neuromodulatory influences on hedonic responses to actual receipt of reward, or punishment, referred to as experienced utility are less well characterized, although a link to the endogenous opioid system is suggested. Here, in a combined functional magnetic resonance imaging-psychopharmacological investigation, we used naloxone to block central opioid function while subjects performed a gambling task associated with rewards and losses of different magnitudes, in which the mean expected value was always zero. A graded influence of naloxone on reward outcome was evident in an attenuation of pleasure ratings for larger reward outcomes, an effect mirrored in attenuation of brain activity to increasing reward magnitude in rostral anterior cingulate cortex. A more striking effect was seen for losses such that under naloxone all levels of negative outcome were rated as more unpleasant. This hedonic effect was associated with enhanced activity in anterior insula and caudal anterior cingulate cortex, areas implicated in aversive processing. Our data indicate that a central opioid system contributes to both reward and loss processing in humans and directly modulates the hedonic experience of outcomes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

People are alarmingly susceptible to manipulations that change both their expectations and experience of the value of goods. Recent studies in behavioral economics suggest such variability reflects more than mere caprice. People commonly judge options and prices in relative terms, rather than absolutely, and display strong sensitivity to exemplar and price anchors. We propose that these findings elucidate important principles about reward processing in the brain. In particular, relative valuation may be a natural consequence of adaptive coding of neuronal firing to optimise sensitivity across large ranges of value. Furthermore, the initial apparent arbitrariness of value may reflect the brains' attempts to optimally integrate diverse sources of value-relevant information in the face of perceived uncertainty. Recent findings in neuroscience support both accounts, and implicate regions in the orbitofrontal cortex, striatum, and ventromedial prefrontal cortex in the construction of value.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Studies on human monetary prediction and decision making emphasize the role of the striatum in encoding prediction errors for financial reward. However, less is known about how the brain encodes financial loss. Using Pavlovian conditioning of visual cues to outcomes that simultaneously incorporate the chance of financial reward and loss, we show that striatal activation reflects positively signed prediction errors for both. Furthermore, we show functional segregation within the striatum, with more anterior regions showing relative selectivity for rewards and more posterior regions for losses. These findings mirror the anteroposterior valence-specific gradient reported in rodents and endorse the role of the striatum in aversive motivational learning about financial losses, illustrating functional and anatomical consistencies with primary aversive outcomes such as pain.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The neural processes underlying empathy are a subject of intense interest within the social neurosciences. However, very little is known about how brain empathic responses are modulated by the affective link between individuals. We show here that empathic responses are modulated by learned preferences, a result consistent with economic models of social preferences. We engaged male and female volunteers in an economic game, in which two confederates played fairly or unfairly, and then measured brain activity with functional magnetic resonance imaging while these same volunteers observed the confederates receiving pain. Both sexes exhibited empathy-related activation in pain-related brain areas (fronto-insular and anterior cingulate cortices) towards fair players. However, these empathy-related responses were significantly reduced in males when observing an unfair person receiving pain. This effect was accompanied by increased activation in reward-related areas, correlated with an expressed desire for revenge. We conclude that in men (at least) empathic responses are shaped by valuation of other people's social behaviour, such that they empathize with fair opponents while favouring the physical punishment of unfair opponents, a finding that echoes recent evidence for altruistic punishment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Termination of a painful or unpleasant event can be rewarding. However, whether the brain treats relief in a similar way as it treats natural reward is unclear, and the neural processes that underlie its representation as a motivational goal remain poorly understood. We used fMRI (functional magnetic resonance imaging) to investigate how humans learn to generate expectations of pain relief. Using a pavlovian conditioning procedure, we show that subjects experiencing prolonged experimentally induced pain can be conditioned to predict pain relief. This proceeds in a manner consistent with contemporary reward-learning theory (average reward/loss reinforcement learning), reflected by neural activity in the amygdala and midbrain. Furthermore, these reward-like learning signals are mirrored by opposite aversion-like signals in lateral orbitofrontal cortex and anterior cingulate cortex. This dual coding has parallels to 'opponent process' theories in psychology and promotes a formal account of prediction and expectation during pain.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The ability to use environmental stimuli to predict impending harm is critical for survival. Such predictions should be available as early as they are reliable. In pavlovian conditioning, chains of successively earlier predictors are studied in terms of higher-order relationships, and have inspired computational theories such as temporal difference learning. However, there is at present no adequate neurobiological account of how this learning occurs. Here, in a functional magnetic resonance imaging (fMRI) study of higher-order aversive conditioning, we describe a key computational strategy that humans use to learn predictions about pain. We show that neural activity in the ventral striatum and the anterior insula displays a marked correspondence to the signals for sequential learning predicted by temporal difference models. This result reveals a flexible aversive learning process ideally suited to the changing and uncertain nature of real-world environments. Taken with existing data on reward learning, our results suggest a critical role for the ventral striatum in integrating complex appetitive and aversive predictions to coordinate behaviour.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Statistical dialog systems (SDSs) are motivated by the need for a data-driven framework that reduces the cost of laboriously handcrafting complex dialog managers and that provides robustness against the errors created by speech recognizers operating in noisy environments. By including an explicit Bayesian model of uncertainty and by optimizing the policy via a reward-driven process, partially observable Markov decision processes (POMDPs) provide such a framework. However, exact model representation and optimization is computationally intractable. Hence, the practical application of POMDP-based systems requires efficient algorithms and carefully constructed approximations. This review article provides an overview of the current state of the art in the development of POMDP-based spoken dialog systems. © 1963-2012 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to influence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general "full spike train" code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Standard theories of decision-making involving delayed outcomes predict that people should defer a punishment, whilst advancing a reward. In some cases, such as pain, people seem to prefer to expedite punishment, implying that its anticipation carries a cost, often conceptualized as 'dread'. Despite empirical support for the existence of dread, whether and how it depends on prospective delay is unknown. Furthermore, it is unclear whether dread represents a stable component of value, or is modulated by biases such as framing effects. Here, we examine choices made between different numbers of painful shocks to be delivered faithfully at different time points up to 15 minutes in the future, as well as choices between hypothetical painful dental appointments at time points of up to approximately eight months in the future, to test alternative models for how future pain is disvalued. We show that future pain initially becomes increasingly aversive with increasing delay, but does so at a decreasing rate. This is consistent with a value model in which moment-by-moment dread increases up to the time of expected pain, such that dread becomes equivalent to the discounted expectation of pain. For a minority of individuals pain has maximum negative value at intermediate delay, suggesting that the dread function may itself be prospectively discounted in time. Framing an outcome as relief reduces the overall preference to expedite pain, which can be parameterized by reducing the rate of the dread-discounting function. Our data support an account of disvaluation for primary punishments such as pain, which differs fundamentally from existing models applied to financial punishments, in which dread exerts a powerful but time-dependent influence over choice.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy. © 2013 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mammalian studies show that frustration is experienced when goal-directed activity is blocked. Despite frustration's strongly negative role in health, aggression and social relationships, the neural mechanisms are not well understood. To address this we developed a task in which participants were blocked from obtaining a reward, an established method of producing frustration. Levels of experienced frustration were parametrically varied by manipulating the participants' motivation to obtain the reward prior to blocking. This was achieved by varying the participants' proximity to a reward and the amount of effort expended in attempting to acquire it. In experiment 1, we confirmed that proximity and expended effort independently enhanced participants' self-reported desire to obtain the reward, and their self-reported frustration and response vigor (key-press force) following blocking. In experiment 2, we used functional magnetic resonance imaging (fMRI) to show that both proximity and expended effort modulated brain responses to blocked reward in regions implicated in animal models of reactive aggression, including the amygdala, midbrain periaqueductal grey (PAG), insula and prefrontal cortex. Our findings suggest that frustration may serve an energizing function, translating unfulfilled motivation into aggressive-like surges via a cortical, amygdala and PAG network.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Computer simulation experiments were performed to examine the effectiveness of OR- and comparative-reinforcement learning algorithms. In the simulation, human rewards were given as +1 and -1. Two models of human instruction that determine which reward is to be given in every step of a human instruction were used. Results show that human instruction may have a possibility of including both model-A and model-B characteristics, and it can be expected that the comparative-reinforcement learning algorithm is more effective for learning by human instructions.