8 resultados para learning work

em Cambridge University Engineering Department Publications Database


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work addresses the problem of estimating the optimal value function in a Markov Decision Process from observed state-action pairs. We adopt a Bayesian approach to inference, which allows both the model to be estimated and predictions about actions to be made in a unified framework, providing a principled approach to mimicry of a controller on the basis of observed data. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from theposterior distribution over the optimal value function. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The use of L1 regularisation for sparse learning has generated immense research interest, with successful application in such diverse areas as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically underperforms in terms of predictive performance when compared with other methods for inferring sparsity. We focus on unsupervised latent variable models, and develop L1 minimising factor models, Bayesian variants of "L1", and Bayesian models with a stronger L0-like sparsity induced through spike-and-slab distributions. These spike-and-slab Bayesian factor models encourage sparsity while accounting for uncertainty in a principled manner and avoiding unnecessary shrinkage of non-zero values. We demonstrate on a number of data sets that in practice spike-and-slab Bayesian methods outperform L1 minimisation, even on a computational budget. We thus highlight the need to re-assess the wide use of L1 methods in sparsity-reliant applications, particularly when we care about generalising to previously unseen data, and provide an alternative that, over many varying conditions, provides improved generalisation performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article presents a novel algorithm for learning parameters in statistical dialogue systems which are modeled as Partially Observable Markov Decision Processes (POMDPs). The three main components of a POMDP dialogue manager are a dialogue model representing dialogue state information; a policy that selects the system's responses based on the inferred state; and a reward function that specifies the desired behavior of the system. Ideally both the model parameters and the policy would be designed to maximize the cumulative reward. However, while there are many techniques available for learning the optimal policy, no good ways of learning the optimal model parameters that scale to real-world dialogue systems have been found yet. The presented algorithm, called the Natural Actor and Belief Critic (NABC), is a policy gradient method that offers a solution to this problem. Based on observed rewards, the algorithm estimates the natural gradient of the expected cumulative reward. The resulting gradient is then used to adapt both the prior distribution of the dialogue model parameters and the policy parameters. In addition, the article presents a variant of the NABC algorithm, called the Natural Belief Critic (NBC), which assumes that the policy is fixed and only the model parameters need to be estimated. The algorithms are evaluated on a spoken dialogue system in the tourist information domain. The experiments show that model parameters estimated to maximize the expected cumulative reward result in significantly improved performance compared to the baseline hand-crafted model parameters. The algorithms are also compared to optimization techniques using plain gradients and state-of-the-art random search algorithms. In all cases, the algorithms based on the natural gradient work significantly better. © 2011 ACM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Perceptual learning improves perception through training. Perceptual learning improves with most stimulus types but fails when . certain stimulus types are mixed during training (roving). This result is surprising because classical supervised and unsupervised neural network models can cope easily with roving conditions. What makes humans so inferior compared to these models? As experimental and conceptual work has shown, human perceptual learning is neither supervised nor unsupervised but reward-based learning. Reward-based learning suffers from the so-called unsupervised bias, i.e., to prevent synaptic " drift" , the . average reward has to be exactly estimated. However, this is impossible when two or more stimulus types with different rewards are presented during training (and the reward is estimated by a running average). For this reason, we propose no learning occurs in roving conditions. However, roving hinders perceptual learning only for combinations of similar stimulus types but not for dissimilar ones. In this latter case, we propose that a critic can estimate the reward for each stimulus type separately. One implication of our analysis is that the critic cannot be located in the visual system. © 2011 Elsevier Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation performance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data. © 2010 Association for Computational Linguistics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The contribution described in this paper is an algorithm for learning nonlinear, reference tracking, control policies given no prior knowledge of the dynamical system and limited interaction with the system through the learning process. Concepts from the field of reinforcement learning, Bayesian statistics and classical control have been brought together in the formulation of this algorithm which can be viewed as a form of indirect self tuning regulator. On the task of reference tracking using a simulated inverted pendulum it was shown to yield generally improved performance on the best controller derived from the standard linear quadratic method using only 30 s of total interaction with the system. Finally, the algorithm was shown to work on the simulated double pendulum proving its ability to solve nontrivial control tasks. © 2011 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses innovations in curriculum development in the Department of Engineering at the University of Cambridge as a participant in the Teaching for Learning Network (TFLN), a teaching and learning development initiative funded by the Cambridge-MIT Institute a pedagogic collaboration and brokerage network. A year-long research and development project investigated the practical experiences through which students traditionally explore engineering disciplines, apply and extend the knowledge gained in lectures and other settings, and begin to develop their professional expertise. The research project evaluated current practice in these sessions and developed an evidence-base to identify requirements for new activities, student support and staff development. The evidence collected included a novel student 'practice-value' survey highlighting effective practice and areas of concern, classroom observation of practicals, semi-structured interviews with staff, a student focus group and informal discussions with staff. Analysis of the data identified three potentially 'high-leverage' strategies for improvement: development of a more integrated teaching framework, within which practical work could be contextualised in relation to other learning; a more transparent and integrated conceptual framework where theory and practice were more closely linked; development of practical work more reflective of the complex problems facing professional engineers. This paper sets out key elements of the evidence collected and the changes that have been informed by this evidence and analysis, leading to the creation of a suite of integrated practical sessions carefully linked to other course elements and reinforcing central concepts in engineering, accompanied by a training and support programme for teaching staff.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this research was to investigate the extent to which prior technological experience of products is related to age, and if this has implications for the success of subsequent product interaction. The contribution of this work is to provide the design community with new knowledge and a greater awareness of the diversity of user needs, and particularly the needs and skills of older people. The focus of this paper is to present how individual's mental models of products and interaction were developed through experiential learning; what new knowledge was acquired, and how this contributed to the development of mental models and product understanding. © 2013 Springer-Verlag Berlin Heidelberg.