New algorithms of the Q-learning type


Autoria(s): Bhatnagar, Shalabh; Babu, K Mohan
Data(s)

01/04/2008

Resumo

We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state–action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.

Formato

application/pdf

Identificador

http://eprints.iisc.ernet.in/26525/1/0.pdf

Bhatnagar, Shalabh and Babu, K Mohan (2008) New algorithms of the Q-learning type. In: Automatica, 44 (4). pp. 1111-1119.

Publicador

Elsevier Science

Relação

http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V21-4RDBF7S-5&_user=512776&_coverDate=04%2F30%2F2008&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_acct=C000025298&_version=1&_urlVersion=0&_userid=512776&md5=10fed5ae90cf13b79ab5e319

http://eprints.iisc.ernet.in/26525/

Palavras-Chave #Computer Science & Automation (Formerly, School of Automation)
Tipo

Journal Article

PeerReviewed