Biblioteca Digital

**Autoria(s):** Gittins, J.; Wang, Y-G.
Data(s)	1992
Resumo	For a multiarmed bandit problem with exponential discounting the optimal allocation rule is defined by a dynamic allocation index defined for each arm on its space. The index for an arm is equal to the expected immediate reward from the arm, with an upward adjustment reflecting any uncertainty about the prospects of obtaining rewards from the arm, and the possibilities of resolving those uncertainties by selecting that arm. Thus the learning component of the index is defined to be the difference between the index and the expected immediate reward. For two arms with the same expected immediate reward the learning component should be larger for the arm for which the reward rate is more uncertain. This is shown to be true for arms based on independent samples from a fixed distribution with an unknown parameter in the cases of Bernoulli and normal distributions, and similar results are obtained in other cases.
Identificador	http://eprints.qut.edu.au/90633/
Publicador	Institute of Mathematical Statistics
Relação	DOI:10.1214/aos/1176348788 Gittins, J. & Wang, Y-G. (1992) The Learning Component of Dynamic Allocation Indices. The Annals of Statistics, 20(3), pp. 1625-1636.
Direitos	Copyright Institute of Mathematical Statistics
Fonte	Science & Engineering Faculty
Palavras-Chave	#dynamic allocation index #gittins index #multiarmed bandit #target #processes
Tipo	Journal Article

Acesso ao item digital