An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
| Data(s) |
01/12/2010
|
|---|---|
| Resumo |
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved. |
| Formato |
application/pdf |
| Identificador |
http://eprints.iisc.ernet.in/36331/1/An_actor.pdf Bhatnagar, Shalabh (2010) An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes. In: Systems & Control Letters, 59 (12). pp. 760-766. |
| Publicador |
Elsevier Science B.V. |
| Relação |
http://dx.doi.org/10.1016/j.sysconle.2010.08.013 http://eprints.iisc.ernet.in/36331/ |
| Palavras-Chave | #Computer Science & Automation (Formerly, School of Automation) |
| Tipo |
Journal Article PeerReviewed |