Biblioteca Digital

78 resultados para Policy convergence

em Cambridge University Engineering Department Publications Database

On-line policy optimisation of Bayesian spoken dialogue systems via human interaction

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy. © 2013 IEEE.

Convergence properties of a regularization scheme for mathematical programs with complementarity constraints

Relevância:

20.00% 20.00%

Publicador:

Local convergence of SQP methods for mathematical programs with equilibrium constraints

Relevância:

20.00% 20.00%

Publicador:

PILCO: A model-based and data-efficient approach to policy search

Relevância:

20.00% 20.00%

Publicador:

Entrepreneurship and innovation policy

Relevância:

20.00% 20.00%

Publicador:

Entrepreneurship and innovation policy

Relevância:

20.00% 20.00%

Publicador:

A pilot study on the emergence of university-level innovation policy in the UK

Relevância:

20.00% 20.00%

Publicador:

Impact of ramp-up on the optimal reconfiguration policy for modern production systems

Relevância:

20.00% 20.00%

Publicador:

Convergence of the auxiliary particle implementation of the PHD filter

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Optimal Bayesian multi-target filtering is in general computationally impractical owing to the high dimensionality of the multi-target state. The Probability Hypothesis Density (PHD) filter propagates the first moment of the multi-target posterior distribution. While this reduces the dimensionality of the problem, the PHD filter still involves intractable integrals in many cases of interest. Several authors have proposed Sequential Monte Carlo (SMC) implementations of the PHD filter. However, these implementations are the equivalent of the Bootstrap Particle Filter, and the latter is well known to be inefficient. Drawing on ideas from the Auxiliary Particle Filter (APF), a SMC implementation of the PHD filter which employs auxiliary variables to enhance its efficiency was proposed by Whiteley et. al. Numerical examples were presented for two scenarios, including a challenging nonlinear observation model, to support the claim. This paper studies the theoretical properties of this auxiliary particle implementation. $\mathbb{L}_p$ error bounds are established from which almost sure convergence follows.

On convergence conditions for rendezvous

Relevância:

20.00% 20.00%

Publicador:

Policy-based management in ad hoc networks using geographic routing

Relevância:

20.00% 20.00%

Publicador:

Technology management and broadband competition policy

Relevância:

20.00% 20.00%

Publicador:

On the convergence of a two timescale stochastic optimisation algorithm for optimal observer trajectory planning

Relevância:

20.00% 20.00%

Publicador:

Convergence of the SMC implementation of the PHD filter

Relevância:

20.00% 20.00%

Publicador:

A policy gradient method for semi-Markov decision processes with application to call admission control

Relevância:

20.00% 20.00%

Publicador:

«
1
2
3
4
5
6
»