High-probability regret bounds for bandit online linear optimization
Data(s) |
01/07/2008
|
---|---|
Resumo |
We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n 3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining high probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings. |
Formato |
application/pdf |
Identificador | |
Relação |
http://eprints.qut.edu.au/45706/1/30-Bartlett.pdf http://colt2008.cs.helsinki.fi/ Bartlett, Peter L., Dani, Varsha, Hayes, Thomas, Kakade, Sham, Rakhlin, Alexander, & Tewari, Ambuj (2008) High-probability regret bounds for bandit online linear optimization. In 21th Annual Conference on Learning Theory (COLT 2008), 9-12 July 2008, Helsinki, Finland. |
Direitos |
Copyright 2008 [please consult the authors] |
Fonte |
Faculty of Science and Technology; Mathematical Sciences |
Palavras-Chave | #080600 INFORMATION SYSTEMS #algorithm #linear optimization #high probability |
Tipo |
Conference Paper |