Two-timescale Q-learning Algorithms with an Application to Routing in Networks


Autoria(s): Mohan Babu, K; Bhatnagar, Shalabh
Data(s)

01/02/2007

Resumo

We propose two variants of the Q-learning algorithm that (both) use two timescales. One of these updates Q-values of all feasible state-action pairs at each instant while the other updates Q-values of states with actions chosen according to the ‘current ’ randomized policy updates. A sketch of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms for routing on different network topologies are presented and performance comparisons with the regular Q-learning algorithm are shown.

Formato

application/pdf

Identificador

http://eprints.iisc.ernet.in/41467/1/10.1.1.130.7691.pdf

Mohan Babu, K and Bhatnagar, Shalabh (2007) Two-timescale Q-learning Algorithms with an Application to Routing in Networks. In: International Conference on Advances in Control and Optimization of Dynamical Systems, ACODS- Bangalore, Feb. 2007, Bangalore.

Relação

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.7691

http://eprints.iisc.ernet.in/41467/

Palavras-Chave #Computer Science & Automation (Formerly, School of Automation)
Tipo

Conference Paper

PeerReviewed