2 resultados para Order of Convergence

em Massachusetts Institute of Technology


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we consider the problem of approximating a function belonging to some funtion space Φ by a linear comination of n translates of a given function G. Ussing a lemma by Jones (1990) and Barron (1991) we show that it is possible to define function spaces and functions G for which the rate of convergence to zero of the erro is 0(1/n) in any number of dimensions. The apparent avoidance of the "curse of dimensionality" is due to the fact that these function spaces are more and more constrained as the dimension increases. Examples include spaces of the Sobolev tpe, in which the number of weak derivatives is required to be larger than the number of dimensions. We give results both for approximation in the L2 norm and in the Lc norm. The interesting feature of these results is that, thanks to the constructive nature of Jones" and Barron"s lemma, an iterative procedure is defined that can achieve this rate.