Biblioteca Digital

**Autoria(s):** Bhatnagar, Shalabh; Babu, K Mohan
Data(s)	01/04/2008
Resumo	We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state–action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.
Formato	application/pdf
Identificador	http://eprints.iisc.ernet.in/26525/1/0.pdf Bhatnagar, Shalabh and Babu, K Mohan (2008) New algorithms of the Q-learning type. In: Automatica, 44 (4). pp. 1111-1119.
Publicador	Elsevier Science
Relação	http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V21-4RDBF7S-5&_user=512776&_coverDate=04%2F30%2F2008&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_acct=C000025298&_version=1&_urlVersion=0&_userid=512776&md5=10fed5ae90cf13b79ab5e319 http://eprints.iisc.ernet.in/26525/
Palavras-Chave	#Computer Science & Automation (Formerly, School of Automation)
Tipo	Journal Article PeerReviewed

Acesso ao item digital