25 resultados para Learning Performance
Resumo:
In the field of motor control, two hypotheses have been controversial: whether the brain acquires internal models that generate accurate motor commands, or whether the brain avoids this by using the viscoelasticity of musculoskeletal system. Recent observations on relatively low stiffness during trained movements support the existence of internal models. However, no study has revealed the decrease in viscoelasticity associated with learning that would imply improvement of internal models as well as synergy between the two hypothetical mechanisms. Previously observed decreases in electromyogram (EMG) might have other explanations, such as trajectory modifications that reduce joint torques. To circumvent such complications, we required strict trajectory control and examined only successful trials having identical trajectory and torque profiles. Subjects were asked to perform a hand movement in unison with a target moving along a specified and unusual trajectory, with shoulder and elbow in the horizontal plane at the shoulder level. To evaluate joint viscoelasticity during the learning of this movement, we proposed an index of muscle co-contraction around the joint (IMCJ). The IMCJ was defined as the summation of the absolute values of antagonistic muscle torques around the joint and computed from the linear relation between surface EMG and joint torque. The IMCJ during isometric contraction, as well as during movements, was confirmed to correlate well with joint stiffness estimated using the conventional method, i.e., applying mechanical perturbations. Accordingly, the IMCJ during the learning of the movement was computed for each joint of each trial using estimated EMG-torque relationship. At the same time, the performance error for each trial was specified as the root mean square of the distance between the target and hand at each time step over the entire trajectory. The time-series data of IMCJ and performance error were decomposed into long-term components that showed decreases in IMCJ in accordance with learning with little change in the trajectory and short-term interactions between the IMCJ and performance error. A cross-correlation analysis and impulse responses both suggested that higher IMCJs follow poor performances, and lower IMCJs follow good performances within a few successive trials. Our results support the hypothesis that viscoelasticity contributes more when internal models are inaccurate, while internal models contribute more after the completion of learning. It is demonstrated that the CNS regulates viscoelasticity on a short- and long-term basis depending on performance error and finally acquires smooth and accurate movements while maintaining stability during the entire learning process.
Resumo:
Motor task variation has been shown to be a key ingredient in skill transfer, retention, and structural learning. However, many studies only compare training of randomly varying tasks to either blocked or null training, and it is not clear how experiencing different nonrandom temporal orderings of tasks might affect the learning process. Here we study learning in human subjects who experience the same set of visuomotor rotations, evenly spaced between -60° and +60°, either in a random order or in an order in which the rotation angle changed gradually. We compared subsequent learning of three test blocks of +30°→-30°→+30° rotations. The groups that underwent either random or gradual training showed significant (P < 0.01) facilitation of learning in the test blocks compared with a control group who had not experienced any visuomotor rotations before. We also found that movement initiation times in the random group during the test blocks were significantly (P < 0.05) lower than for the gradual or the control group. When we fit a state-space model with fast and slow learning processes to our data, we found that the differences in performance in the test block were consistent with the gradual or random task variation changing the learning and retention rates of only the fast learning process. Such adaptation of learning rates may be a key feature of ongoing meta-learning processes. Our results therefore suggest that both gradual and random task variation can induce meta-learning and that random learning has an advantage in terms of shorter initiation times, suggesting less reliance on cognitive processes.
Resumo:
Information theoretic active learning has been widely studied for probabilistic models. For simple regression an optimal myopic policy is easily tractable. However, for other tasks and with more complex models, such as classification with nonparametric models, the optimal solution is harder to compute. Current approaches make approximations to achieve tractability. We propose an approach that expresses information gain in terms of predictive entropies, and apply this method to the Gaussian Process Classifier (GPC). Our approach makes minimal approximations to the full information theoretic objective. Our experimental performance compares favourably to many popular active learning algorithms, and has equal or lower computational complexity. We compare well to decision theoretic approaches also, which are privy to more information and require much more computational time. Secondly, by developing further a reformulation of binary preference learning to a classification problem, we extend our algorithm to Gaussian Process preference learning.
Resumo:
The unscented Kalman filter (UKF) is a widely used method in control and time series applications. The UKF suffers from arbitrary parameters necessary for sigma point placement, potentially causing it to perform poorly in nonlinear problems. We show how to treat sigma point placement in a UKF as a learning problem in a model based view. We demonstrate that learning to place the sigma points correctly from data can make sigma point collapse much less likely. Learning can result in a significant increase in predictive performance over default settings of the parameters in the UKF and other filters designed to avoid the problems of the UKF, such as the GP-ADF. At the same time, we maintain a lower computational complexity than the other methods. We call our method UKF-L. © 2011 Elsevier B.V.
Resumo:
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters. © 2011 Elsevier Ltd. All rights reserved.
Resumo:
Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation performance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data. © 2010 Association for Computational Linguistics.
Resumo:
The contribution described in this paper is an algorithm for learning nonlinear, reference tracking, control policies given no prior knowledge of the dynamical system and limited interaction with the system through the learning process. Concepts from the field of reinforcement learning, Bayesian statistics and classical control have been brought together in the formulation of this algorithm which can be viewed as a form of indirect self tuning regulator. On the task of reference tracking using a simulated inverted pendulum it was shown to yield generally improved performance on the best controller derived from the standard linear quadratic method using only 30 s of total interaction with the system. Finally, the algorithm was shown to work on the simulated double pendulum proving its ability to solve nontrivial control tasks. © 2011 IEEE.
Resumo:
The Masters programme in Engineering for Sustainable Development at Cambridge University explores a number of key themes, including dealing with: complexity, uncertainty, change, other disciplines, people, environmental limits, whole life costs, and trade-offs. This paper examines how these concepts are introduced and analyses the range of exercises and assignments which are designed to encourage students to test their own assumptions and abilities to develop competencies in these areas. Student performance against these tasks is discussed and student feedback is also presented, with a focus on how their awareness of the themes are met through a range of activities.
Resumo:
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.
Resumo:
Successful motor performance requires the ability to adapt motor commands to task dynamics. A central question in movement neuroscience is how these dynamics are represented. Although it is widely assumed that dynamics (e.g., force fields) are represented in intrinsic, joint-based coordinates (Shadmehr R, Mussa-Ivaldi FA. J Neurosci 14: 3208-3224, 1994), recent evidence has questioned this proposal. Here we reexamine the representation of dynamics in two experiments. By testing generalization following changes in shoulder, elbow, or wrist configurations, the first experiment tested for extrinsic, intrinsic, or object-centered representations. No single coordinate frame accounted for the pattern of generalization. Rather, generalization patterns were better accounted for by a mixture of representations or by models that assumed local learning and graded, decaying generalization. A second experiment, in which we replicated the design of an influential study that had suggested encoding in intrinsic coordinates (Shadmehr and Mussa-Ivaldi 1994), yielded similar results. That is, we could not find evidence that dynamics are represented in a single coordinate system. Taken together, our experiments suggest that internal models do not employ a single coordinate system when generalizing and may well be represented as a mixture of coordinate systems, as a single system with local learning, or both.