Biblioteca Digital

**Autoria(s):** Bartlett, Peter L.; Dani, Varsha; Hayes, Thomas; Kakade, Sham; Rakhlin, Alexander; Tewari, Ambuj
Data(s)	01/07/2008
Resumo	We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n 3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining high probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.
Formato	application/pdf
Identificador	http://eprints.qut.edu.au/45706/
Relação	http://eprints.qut.edu.au/45706/1/30-Bartlett.pdf http://colt2008.cs.helsinki.fi/ Bartlett, Peter L., Dani, Varsha, Hayes, Thomas, Kakade, Sham, Rakhlin, Alexander, & Tewari, Ambuj (2008) High-probability regret bounds for bandit online linear optimization. In 21th Annual Conference on Learning Theory (COLT 2008), 9-12 July 2008, Helsinki, Finland.
Direitos	Copyright 2008 [please consult the authors]
Fonte	Faculty of Science and Technology; Mathematical Sciences
Palavras-Chave	#080600 INFORMATION SYSTEMS #algorithm #linear optimization #high probability
Tipo	Conference Paper

Acesso ao item digital