On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Data(s) |
20/10/2004
20/10/2004
01/08/1993
|
---|---|
Resumo |
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong. |
Formato |
15 p. 77605 bytes 356324 bytes application/octet-stream application/pdf |
Identificador |
AIM-1441 CBCL-084 |
Idioma(s) |
en_US |
Relação |
AIM-1441 CBCL-084 |
Palavras-Chave | #reinforcement learning #stochastic approximation #sconvergence #dynamic programming |