On the Convergence of Stochastic Iterative Dynamic Programming Algorithms


Autoria(s): Jaakkola, Tommi; Jordan, Michael I.; Singh, Satinder P.
Data(s)

20/10/2004

20/10/2004

01/08/1993

Resumo

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.

Formato

15 p.

77605 bytes

356324 bytes

application/octet-stream

application/pdf

Identificador

AIM-1441

CBCL-084

http://hdl.handle.net/1721.1/7205

Idioma(s)

en_US

Relação

AIM-1441

CBCL-084

Palavras-Chave #reinforcement learning #stochastic approximation #sconvergence #dynamic programming