975 resultados para REINFORCEMENT


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective of this study was to determine if the responses of basal forebrain neurons are related to the cognitive processes necessary for the performance of behavioural tasks, or to the hedonic attributes of the reinforcers delivered to the monkey as

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters. © 2011 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The contribution described in this paper is an algorithm for learning nonlinear, reference tracking, control policies given no prior knowledge of the dynamical system and limited interaction with the system through the learning process. Concepts from the field of reinforcement learning, Bayesian statistics and classical control have been brought together in the formulation of this algorithm which can be viewed as a form of indirect self tuning regulator. On the task of reference tracking using a simulated inverted pendulum it was shown to yield generally improved performance on the best controller derived from the standard linear quadratic method using only 30 s of total interaction with the system. Finally, the algorithm was shown to work on the simulated double pendulum proving its ability to solve nontrivial control tasks. © 2011 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Post-earthquake structural safety evaluations are currently performed manually by a team of certified inspectors and/or structural engineers. This process is time-consuming and costly, keeping owners and occupants from returning to their businesses and homes. Automating these evaluations would enable faster, and potentially more consistent, relief and response processes. In order to do this, the detection of exposed reinforcing steel is of utmost significance. This paper presents a novel method of detecting exposed reinforcement in concrete columns for the purpose of advancing practices of structural and safety evaluation of buildings after earthquakes. Under this method, the binary image of the reinforcing area is first isolated using a state-of-the-art adaptive thresholding technique. Next, the ribbed regions of the reinforcement are detected by way of binary template matching. Finally, vertical and horizontal profiling are applied to the processed image in order to filter out any superfluous pixels and take into consideration the size of reinforcement bars in relation to that of the structural element within which they reside. The final result is the combined binary image disclosing only the regions containing rebar overlaid on top of the original image. The method is tested on a set of images from the January 2010 earthquake in Haiti. Preliminary test results convey that most exposed reinforcement could be properly detected in images of moderately-to-severely damaged concrete columns.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The world is at the threshold of emerging technologies, where new systems in construction, materials, and civil and architectural design are poised to make the world better from a structural and construction perspective. Exciting developments, that are too many to name individually, take place yearly, affecting design considerations and construction practices. This edited book brings together modern methods and advances in structural engineering and construction, fulfilling the mission of ISEC Conferences, which is to enhance communication and understanding between structural and construction engineers for successful design and construction of engineering projects. The articles in this book are those accepted for publication and presentation at the 6th International Structural Engineering and Construction Conference in Zurich. The 6th ISEC Conference in Zurich, Switzerland, follows the overwhelming reception and success of previous ISEC conference in Las Vegas, USA in 2009; Melbourne, Australia in 2007; Shunan, Japan in 2005; Rome, Italy in 2003; and Honolulu, USA in 2001. Many topics are covered in this book, ranging from legal affairs and contracting, to innovations and risk analysis in infrastructure projects, analysis and design of structural systems, materials, architecture, and construction. The articles here are a lasting testimony to the excellent research being undertaken around the world. These articles provide a platform for the exchange of ideas, research efforts and networking in the structural engineering and construction communities. We congratulate and thank the authors for these articles that were selected after intensive peer-review, and our gratitude extends to all reviewers and members of the International Technical Committee. It is their combined contributions that have made this book a reality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The role dopamine plays in decision-making has important theoretical, empirical and clinical implications. Here, we examined its precise contribution by exploiting the lesion deficit model afforded by Parkinson's disease. We studied patients in a two-stage reinforcement learning task, while they were ON and OFF dopamine replacement medication. Contrary to expectation, we found that dopaminergic drug state (ON or OFF) did not impact learning. Instead, the critical factor was drug state during the performance phase, with patients ON medication choosing correctly significantly more frequently than those OFF medication. This effect was independent of drug state during initial learning and appears to reflect a facilitation of generalization for learnt information. This inference is bolstered by our observation that neural activity in nucleus accumbens and ventromedial prefrontal cortex, measured during simultaneously acquired functional magnetic resonance imaging, represented learnt stimulus values during performance. This effect was expressed solely during the ON state with activity in these regions correlating with better performance. Our data indicate that dopamine modulation of nucleus accumbens and ventromedial prefrontal cortex exerts a specific effect on choice behaviour distinct from pure learning. The findings are in keeping with the substantial other evidence that certain aspects of learning are unaffected by dopamine lesions or depletion, and that dopamine plays a key role in performance that may be distinct from its role in learning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The role dopamine plays in decision-making has important theoretical, empirical and clinical implications. Here, we examined its precise contribution by exploiting the lesion deficit model afforded by Parkinson's disease. We studied patients in a two-stage reinforcement learning task, while they were ON and OFF dopamine replacement medication. Contrary to expectation, we found that dopaminergic drug state (ON or OFF) did not impact learning. Instead, the critical factor was drug state during the performance phase, with patients ON medication choosing correctly significantly more frequently than those OFF medication. This effect was independent of drug state during initial learning and appears to reflect a facilitation of generalization for learnt information. This inference is bolstered by our observation that neural activity in nucleus accumbens and ventromedial prefrontal cortex, measured during simultaneously acquired functional magnetic resonance imaging, represented learnt stimulus values during performance. This effect was expressed solely during the ON state with activity in these regions correlating with better performance. Our data indicate that dopamine modulation of nucleus accumbens and ventromedial prefrontal cortex exerts a specific effect on choice behaviour distinct from pure learning. The findings are in keeping with the substantial other evidence that certain aspects of learning are unaffected by dopamine lesions or depletion, and that dopamine plays a key role in performance that may be distinct from its role in learning. © 2012 The Author.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The tendency to make unhealthy choices is hypothesized to be related to an individual's temporal discount rate, the theoretical rate at which they devalue delayed rewards. Furthermore, a particular form of temporal discounting, hyperbolic discounting, has been proposed to explain why unhealthy behavior can occur despite healthy intentions. We examine these two hypotheses in turn. We first systematically review studies which investigate whether discount rates can predict unhealthy behavior. These studies reveal that high discount rates for money (and in some instances food or drug rewards) are associated with several unhealthy behaviors and markers of health status, establishing discounting as a promising predictive measure. We secondly examine whether intention-incongruent unhealthy actions are consistent with hyperbolic discounting. We conclude that intention-incongruent actions are often triggered by environmental cues or changes in motivational state, whose effects are not parameterized by hyperbolic discounting. We propose a framework for understanding these state-based effects in terms of the interplay of two distinct reinforcement learning mechanisms: a "model-based" (or goal-directed) system and a "model-free" (or habitual) system. Under this framework, while discounting of delayed health may contribute to the initiation of unhealthy behavior, with repetition, many unhealthy behaviors become habitual; if health goals then change, habitual behavior can still arise in response to environmental cues. We propose that the burgeoning development of computational models of these processes will permit further identification of health decision-making phenotypes.