988 resultados para Denumerable-markov-processes


Relevância:

40.00% 40.00%

Publicador:

Resumo:

We develop a simulation based algorithm for finite horizon Markov decision processes with finite state and finite action space. Illustrative numerical experiments with the proposed algorithm are shown for problems in flow control of communication networks and capacity switching in semiconductor fabrication.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We introduce and study a class of non-stationary semi-Markov decision processes on a finite horizon. By constructing an equivalent Markov decision process, we establish the existence of a piecewise open loop relaxed control which is optimal for the finite horizon problem.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This work addresses the problem of estimating the optimal value function in a Markov Decision Process from observed state-action pairs. We adopt a Bayesian approach to inference, which allows both the model to be estimated and predictions about actions to be made in a unified framework, providing a principled approach to mimicry of a controller on the basis of observed data. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from theposterior distribution over the optimal value function. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this work the state of the art of the automatic dialogue strategy management using Markov decision processes (MDP) with reinforcement learning (RL) is described. Partially observable Markov decision processes (POMDP) are also described. To test the validity of these methods, two spoken dialogue systems have been developed. The first one is a spoken dialogue system for weather forecast providing, and the second one is a more complex system for train information. With the first system, comparisons between a rule-based system and an automatically trained system have been done, using a real corpus to train the automatic strategy. In the second system, the scalability of these methods when used in larger systems has been tested.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Given a spectral density matrix or, equivalently, a real autocovariance sequence, the author seeks to determine a finite-dimensional linear time-invariant system which, when driven by white noise, will produce an output whose spectral density is approximately PHI ( omega ), and an approximate spectral factor of PHI ( omega ). The author employs the Anderson-Faurre theory in his analysis.