Biblioteca Digital

Prediction with limited advice and multiarmed bandits with paid observations

**Autoria(s):** Seldin, Yevgeny; Bartlett, Peter L.; Crammer, Koby; Abbasi-Yadkori, Yasin
Data(s)	2014
Resumo	We study two problems of online learning under restricted information access. In the first problem, prediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(√(N/M)TlnN ) regret on T rounds of this game. The second problem, the multiarmed bandit with paid observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cNlnN) 1/3 T2/3+√TlnN ) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.
Formato	application/pdf
Identificador	http://eprints.qut.edu.au/70843/
Relação	http://eprints.qut.edu.au/70843/1/70843.pdf http://jmlr.org/proceedings/papers/v32/seldin14-supp.pdf Seldin, Yevgeny, Bartlett, Peter L., Crammer, Koby, & Abbasi-Yadkori, Yasin (2014) Prediction with limited advice and multiarmed bandits with paid observations. In International Conference on Machine Learning, 21–June 26, 2014, Beijing, China. http://purl.org/au-research/grants/ARC/FL110100281
Direitos	Copyright 2014 [please consult the author]
Fonte	Science & Engineering Faculty
Tipo	Conference Paper

Acesso ao item digital