REGAL : a regularization based algorithm for reinforcement learning in weakly communicating MDPs
Data(s) |
2009
|
---|---|
Resumo |
We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H, we show a regret bound of ~ O(HS p AT ). We also relate the span to various diameter-like quantities associated with the MDP, demonstrating how our results improve on previous regret bounds. |
Identificador | |
Relação |
http://www.cs.mcgill.ca/~uai2009/index.html Bartlett, Peter L. & Tewari, Ambuj (2009) REGAL : a regularization based algorithm for reinforcement learning in weakly communicating MDPs. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009)), McGill University, Montreal. |
Direitos |
Copyright 2009 [please consult the authors] |
Fonte |
Faculty of Science and Technology; Mathematical Sciences |
Palavras-Chave | #080100 ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING #algorithm, optimal regret rate, Markov Decision Process (MDP) |
Tipo |
Conference Paper |