11 resultados para Learning strategy

em Cambridge University Engineering Department Publications Database


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Statistical dialogue models have required a large number of dialogues to optimise the dialogue policy, relying on the use of a simulated user. This results in a mismatch between training and live conditions, and significant development costs for the simulator thereby mitigating many of the claimed benefits of such models. Recent work on Gaussian process reinforcement learning, has shown that learning can be substantially accelerated. This paper reports on an experiment to learn a policy for a real-world task directly from human interaction using rewards provided by users. It shows that a usable policy can be learnt in just a few hundred dialogues without needing a user simulator and, using a learning strategy that reduces the risk of taking bad actions. The paper also investigates adaptation behaviour when the system continues learning for several thousand dialogues and highlights the need for robustness to noisy rewards. © 2011 IEEE.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although learning a motor skill, such as a tennis stroke, feels like a unitary experience, researchers who study motor control and learning break the processes involved into a number of interacting components. These components can be organized into four main groups. First, skilled performance requires the effective and efficient gathering of sensory information, such as deciding where and when to direct one's gaze around the court, and thus an important component of skill acquisition involves learning how best to extract task-relevant information. Second, the performer must learn key features of the task such as the geometry and mechanics of the tennis racket and ball, the properties of the court surface, and how the wind affects the ball's flight. Third, the player needs to set up different classes of control that include predictive and reactive control mechanisms that generate appropriate motor commands to achieve the task goals, as well as compliance control that specifies, for example, the stiffness with which the arm holds the racket. Finally, the successful performer can learn higher-level skills such as anticipating and countering the opponent's strategy and making effective decisions about shot selection. In this Primer we shall consider these components of motor learning using as an example how we learn to play tennis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the past decade, a variety of user models have been proposed for user simulation-based reinforcement-learning of dialogue strategies. However, the strategies learned with these models are rarely evaluated in actual user trials and it remains unclear how the choice of user model affects the quality of the learned strategy. In particular, the degree to which strategies learned with a user model generalise to real user populations has not be investigated. This paper presents a series of experiments that qualitatively and quantitatively examine the effect of the user model on the learned strategy. Our results show that the performance and characteristics of the strategy are in fact highly dependent on the user model. Furthermore, a policy trained with a poor user model may appear to perform well when tested with the same model, but fail when tested with a more sophisticated user model. This raises significant doubts about the current practice of learning and evaluating strategies with the same user model. The paper further investigates a new technique for testing and comparing strategies directly on real human-machine dialogues, thereby avoiding any evaluation bias introduced by the user model. © 2005 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The ability to use environmental stimuli to predict impending harm is critical for survival. Such predictions should be available as early as they are reliable. In pavlovian conditioning, chains of successively earlier predictors are studied in terms of higher-order relationships, and have inspired computational theories such as temporal difference learning. However, there is at present no adequate neurobiological account of how this learning occurs. Here, in a functional magnetic resonance imaging (fMRI) study of higher-order aversive conditioning, we describe a key computational strategy that humans use to learn predictions about pain. We show that neural activity in the ventral striatum and the anterior insula displays a marked correspondence to the signals for sequential learning predicted by temporal difference models. This result reveals a flexible aversive learning process ideally suited to the changing and uncertain nature of real-world environments. Taken with existing data on reward learning, our results suggest a critical role for the ventral striatum in integrating complex appetitive and aversive predictions to coordinate behaviour.