New algorithms of the Q-learning type
Data(s) |
01/04/2008
|
---|---|
Resumo |
We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state–action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings. |
Formato |
application/pdf |
Identificador |
http://eprints.iisc.ernet.in/26525/1/0.pdf Bhatnagar, Shalabh and Babu, K Mohan (2008) New algorithms of the Q-learning type. In: Automatica, 44 (4). pp. 1111-1119. |
Publicador |
Elsevier Science |
Relação |
http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V21-4RDBF7S-5&_user=512776&_coverDate=04%2F30%2F2008&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_acct=C000025298&_version=1&_urlVersion=0&_userid=512776&md5=10fed5ae90cf13b79ab5e319 http://eprints.iisc.ernet.in/26525/ |
Palavras-Chave | #Computer Science & Automation (Formerly, School of Automation) |
Tipo |
Journal Article PeerReviewed |