444 resultados para multi-armed bandit

em Queensland University of Technology - ePrints Archive


Relevância:

90.00% 90.00%

Publicador:

Resumo:

For a wide class of semi-Markov decision processes the optimal policies are expressible in terms of the Gittins indices, which have been found useful in sequential clinical trials and pharmaceutical research planning. In general, the indices can be approximated via calibration based on dynamic programming of finite horizon. This paper provides some results on the accuracy of such approximations, and, in particular, gives the error bounds for some well known processes (Bernoulli reward processes, normal reward processes and exponential target processes).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Suppose two treatments with binary responses are available for patients with some disease and that each patient will receive one of the two treatments. In this paper we consider the interests of patients both within and outside a trial using a Bayesian bandit approach and conclude that equal allocation is not appropriate for either group of patients. It is suggested that Gittins indices should be used (using an approach called dynamic discounting by choosing the discount rate based on the number of future patients in the trial) if the disease is rare, and the least failures rule if the disease is common. Some analytical and simulation results are provided.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We study two problems of online learning under restricted information access. In the first problem, prediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(√(N/M)TlnN ) regret on T rounds of this game. The second problem, the multiarmed bandit with paid observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cNlnN) 1/3 T2/3+√TlnN ) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We explore the use of Gittins indices to search for near optimality in sequential clinical trials. Some adaptive allocation rules are proposed to achieve the following two objectives as far as possible: (i) to reduce the expected successes lost, (ii) to minimize the error probability at the end. Simulation results indicate the merits of the rules based on Gittins indices for small trial sizes. The rules are generalized to the case when neither of the response densities is known. Asymptotic optimality is derived for the constrained rules. A simple allocation rule is recommended for one-stage models. The simulation results indicate that it works better than both equal allocation and Bather's randomized allocation. We conclude with a discussion of possible further developments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Metaphor is a multi-stage programming language extension to an imperative, object-oriented language in the style of C# or Java. This paper discusses some issues we faced when applying multi-stage language design concepts to an imperative base language and run-time environment. The issues range from dealing with pervasive references and open code to garbage collection and implementing cross-stage persistence.