12 resultados para Fieldwork Learning Framework

em Cambridge University Engineering Department Publications Database


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work addresses the problem of estimating the optimal value function in a Markov Decision Process from observed state-action pairs. We adopt a Bayesian approach to inference, which allows both the model to be estimated and predictions about actions to be made in a unified framework, providing a principled approach to mimicry of a controller on the basis of observed data. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from theposterior distribution over the optimal value function. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A novel framework is provided for very fast model-based reinforcement learning in continuous state and action spaces. It requires probabilistic models that explicitly characterize their levels of condence. Within the framework, exible, non-parametric models are used to describe the world based on previously collected experience. It demonstrates learning on the cart-pole problem in a setting where very limited prior knowledge about the task has been provided. Learning progressed rapidly, and a good policy found after only a small number of iterations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning is often understood as an organism's gradual acquisition of the association between a given sensory stimulus and the correct motor response. Mathematically, this corresponds to regressing a mapping between the set of observations and the set of actions. Recently, however, it has been shown both in cognitive and motor neuroscience that humans are not only able to learn particular stimulus-response mappings, but are also able to extract abstract structural invariants that facilitate generalization to novel tasks. Here we show how such structure learning can enhance facilitation in a sensorimotor association task performed by human subjects. Using regression and reinforcement learning models we show that the observed facilitation cannot be explained by these basic models of learning stimulus-response associations. We show, however, that the observed data can be explained by a hierarchical Bayesian model that performs structure learning. In line with previous results from cognitive tasks, this suggests that hierarchical Bayesian inference might provide a common framework to explain both the learning of specific stimulus-response associations and the learning of abstract structures that are shared by different task environments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The industrial landscape is becoming increasingly complex and dynamic, with innovative technologies stimulating the emergence of new industries and business models. This paper presents a preliminary framework for mapping industrial emergence, based on roadmapping principles, in order to understand the nature and characteristics of such phenomena. The focus at this stage is on historical examples of industrial emergence, with the preliminary framework based on observations from 20 'quick scan' maps, one of which is used to illustrate the framework. The learning from these historical cases, combined with further industrial consultation and literature review, will be used to develop practical methods for strategy and policy application. The paper concludes by summarising key learning points and further work needed to achieve these outcomes. © 2009 PICMET.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we aim to reconstruct free-from 3D models from a single view by learning the prior knowledge of a specific class of objects. Instead of heuristically proposing specific regularities and defining parametric models as previous research, our shape prior is learned directly from existing 3D models under a framework based on the Gaussian Process Latent Variable Model (GPLVM). The major contributions of the paper include: 1) a probabilistic framework for prior-based reconstruction we propose, which requires no heuristic of the object, and can be easily generalized to handle various categories of 3D objects, and 2) an attempt at automatic reconstruction of more complex 3D shapes, like human bodies, from 2D silhouettes only. Qualitative and quantitative experimental results on both synthetic and real data demonstrate the efficacy of our new approach. ©2009 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The partially observable Markov decision process (POMDP) provides a popular framework for modelling spoken dialogue. This paper describes how the expectation propagation algorithm (EP) can be used to learn the parameters of the POMDP user model. Various special probability factors applicable to this task are presented, which allow the parameters be to learned when the structure of the dialogue is complex. No annotations, neither the true dialogue state nor the true semantics of user utterances, are required. Parameters optimised using the proposed techniques are shown to improve the performance of both offline transcription experiments as well as simulated dialogue management performance. ©2010 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Innovation policies play an important role throughout the development process of emerging industries. However, existing policy studies view the process as a black-box, and fail to understand the policy-industry interactions through the process. This paper aims to develop an integrated technology roadmapping tool, in order to facilitate the better understanding of policy heterogeneity at the different stages of new energy industries in China. Through the case study of Chinese wind energy equipment manufacturing industry, this paper elaborates the dynamics between policy and the growth process of the industry. Further, this paper generalizes some Chinese specifics for the policy-industry interactions. As a practical output, this study proposes a policy-technology roadmapping framework that maps policy-market-product- technology interactions in response to the requirement for analyzing and planning the development of new industries in emerging economies (e.g. China). This paper will be of interest to policy makers, strategists, investors, and industrial experts. © 2011 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Innovation policies play an important role throughout the development process of emerging industries in China. Existing policy and industry studies view the emergence process as a black-box, and fail to understand the impacts of policy to the process along which it varies. This paper aims to develop a multi-dimensional roadmapping tool to better analyse the dynamics between policy and industrial growth for new industries in China. Through reviewing the emergence process of Chinese wind turbine industry, this paper elaborates how policy and other factors influence the emergence of this industry along this path. Further, this paper generalises some Chinese specifics for the policy-industry dynamics. As a practical output, this study proposes a roadmapping framework that generalises some patterns of policy-industry interactions for the emergence process of new industries in China. This paper will be of interest to policy makers, strategists, investors and industrial experts. Copyright © 2013 Inderscience Enterprises Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The code provided here originally demonstrated the main algorithms from Rasmussen and Williams: Gaussian Processes for Machine Learning. It has since grown to allow more likelihood functions, further inference methods and a flexible framework for specifying GPs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The tendency to make unhealthy choices is hypothesized to be related to an individual's temporal discount rate, the theoretical rate at which they devalue delayed rewards. Furthermore, a particular form of temporal discounting, hyperbolic discounting, has been proposed to explain why unhealthy behavior can occur despite healthy intentions. We examine these two hypotheses in turn. We first systematically review studies which investigate whether discount rates can predict unhealthy behavior. These studies reveal that high discount rates for money (and in some instances food or drug rewards) are associated with several unhealthy behaviors and markers of health status, establishing discounting as a promising predictive measure. We secondly examine whether intention-incongruent unhealthy actions are consistent with hyperbolic discounting. We conclude that intention-incongruent actions are often triggered by environmental cues or changes in motivational state, whose effects are not parameterized by hyperbolic discounting. We propose a framework for understanding these state-based effects in terms of the interplay of two distinct reinforcement learning mechanisms: a "model-based" (or goal-directed) system and a "model-free" (or habitual) system. Under this framework, while discounting of delayed health may contribute to the initiation of unhealthy behavior, with repetition, many unhealthy behaviors become habitual; if health goals then change, habitual behavior can still arise in response to environmental cues. We propose that the burgeoning development of computational models of these processes will permit further identification of health decision-making phenotypes.