High-probability regret bounds for bandit online linear optimization


Autoria(s): Bartlett, Peter L.; Dani, Varsha; Hayes, Thomas; Kakade, Sham; Rakhlin, Alexander; Tewari, Ambuj
Data(s)

01/07/2008

Resumo

We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n 3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining high probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/45706/

Relação

http://eprints.qut.edu.au/45706/1/30-Bartlett.pdf

http://colt2008.cs.helsinki.fi/

Bartlett, Peter L., Dani, Varsha, Hayes, Thomas, Kakade, Sham, Rakhlin, Alexander, & Tewari, Ambuj (2008) High-probability regret bounds for bandit online linear optimization. In 21th Annual Conference on Learning Theory (COLT 2008), 9-12 July 2008, Helsinki, Finland.

Direitos

Copyright 2008 [please consult the authors]

Fonte

Faculty of Science and Technology; Mathematical Sciences

Palavras-Chave #080600 INFORMATION SYSTEMS #algorithm #linear optimization #high probability
Tipo

Conference Paper