998 resultados para multi-armed bandit
Resumo:
In pay-per click sponsored search auctions which are currently extensively used by search engines, the auction for a keyword involves a certain number of advertisers (say k) competing for available slots (say m) to display their ads. This auction is typically conducted for a number of rounds (say T). There are click probabilities mu_ij associated with agent-slot pairs. The search engine's goal is to maximize social welfare, for example, the sum of values of the advertisers. The search engine does not know the true value of an advertiser for a click to her ad and also does not know the click probabilities mu_ij s. A key problem for the search engine therefore is to learn these during the T rounds of the auction and also to ensure that the auction mechanism is truthful. Mechanisms for addressing such learning and incentives issues have recently been introduced and would be referred to as multi-armed-bandit (MAB) mechanisms. When m = 1,characterizations for truthful MAB mechanisms are available in the literature and it has been shown that the regret for such mechanisms will be O(T^{2/3}). In this paper, we seek to derive a characterization in the realistic but nontrivial general case when m > 1 and obtain several interesting results.
Resumo:
In pay-per-click sponsored search auctions which are currently extensively used by search engines, the auction for a keyword involves a certain number of advertisers (say k) competing for available slots (say m) to display their advertisements (ads for short). A sponsored search auction for a keyword is typically conducted for a number of rounds (say T). There are click probabilities mu(ij) associated with each agent slot pair (agent i and slot j). The search engine would like to maximize the social welfare of the advertisers, that is, the sum of values of the advertisers for the keyword. However, the search engine does not know the true values advertisers have for a click to their respective advertisements and also does not know the click probabilities. A key problem for the search engine therefore is to learn these click probabilities during the initial rounds of the auction and also to ensure that the auction mechanism is truthful. Mechanisms for addressing such learning and incentives issues have recently been introduced. These mechanisms, due to their connection to the multi-armed bandit problem, are aptly referred to as multi-armed bandit (MAB) mechanisms. When m = 1, exact characterizations for truthful MAB mechanisms are available in the literature. Recent work has focused on the more realistic but non-trivial general case when m > 1 and a few promising results have started appearing. In this article, we consider this general case when m > 1 and prove several interesting results. Our contributions include: (1) When, mu(ij)s are unconstrained, we prove that any truthful mechanism must satisfy strong pointwise monotonicity and show that the regret will be Theta T7) for such mechanisms. (2) When the clicks on the ads follow a certain click precedence property, we show that weak pointwise monotonicity is necessary for MAB mechanisms to be truthful. (3) If the search engine has a certain coarse pre-estimate of mu(ij) values and wishes to update them during the course of the T rounds, we show that weak pointwise monotonicity and type-I separatedness are necessary while weak pointwise monotonicity and type-II separatedness are sufficient conditions for the MAB mechanisms to be truthful. (4) If the click probabilities are separable into agent-specific and slot-specific terms, we provide a characterization of MAB mechanisms that are truthful in expectation.
Resumo:
Background The application of polyethylenimine (PEI) in gene delivery has been severely limited by significant cytotoxicity that results from a nondegradable methylene backbone and high cationic charge density. It is therefore necessary to develop novel biodegradable PEI derivates for low-toxic, highly efficient gene delivery.Methods A series of novel cationic copolymers with various charge density were designed and synthesized by grafting different kinds of oligoethylenimine (OEI) onto a determinate multi-armed poly(L-glutamic acid) backbone. The molecular structures of multi-armed poly(L-glutamic acid)-graft-OEI (MP-g-OEI) copolymers were characterized using nuclear magnetic resonance, viscosimetry and gel permeation chromatography. Moreover, the MP-g-OEI/DNA complexes were measured by a gel retardation assay, dynamic light scattering and atomic force microscopy to determine DNA binding ability, particle size, zeta potential, complex formation and shape, respectively. MP-g-OEI copolymers were also evaluated in Chinese hamster ovary and human embryonic kidney-293 cells for their cytotoxicity and transfection efficiency.
Resumo:
A simple, productive and low-cost route has been developed to synthesize multi-armed CdTe nanorods using myristic acid (MA) as a complex agent. The yield of this approach can reach 75%. The dimension of the multi-armed nanorods can be controlled by tuning the molar ratios of Cd/Te and Cd/MA; the diameter can be changed from 2 to 7 nm while the length from 15 to 60 nm. The hexagonal structure was confirmed in x-ray diffraction analysis. However, it was assumed that one crystal is composed of the dominant hexagonal structure along with a cubic structure in the core.
Resumo:
We explore the problem of budgeted machine learning, in which the learning algorithm has free access to the training examples’ labels but has to pay for each attribute that is specified. This learning model is appropriate in many areas, including medical applications. We present new algorithms for choosing which attributes to purchase of which examples in the budgeted learning model based on algorithms for the multi-armed bandit problem. All of our approaches outperformed the current state of the art. Furthermore, we present a new means for selecting an example to purchase after the attribute is selected, instead of selecting an example uniformly at random, which is typically done. Our new example selection method improved performance of all the algorithms we tested, both ours and those in the literature.
Resumo:
This work deals with parallel optimization of expensive objective functions which are modelled as sample realizations of Gaussian processes. The study is formalized as a Bayesian optimization problem, or continuous multi-armed bandit problem, where a batch of q > 0 arms is pulled in parallel at each iteration. Several algorithms have been developed for choosing batches by trading off exploitation and exploration. As of today, the maximum Expected Improvement (EI) and Upper Confidence Bound (UCB) selection rules appear as the most prominent approaches for batch selection. Here, we build upon recent work on the multipoint Expected Improvement criterion, for which an analytic expansion relying on Tallis’ formula was recently established. The computational burden of this selection rule being still an issue in application, we derive a closed-form expression for the gradient of the multipoint Expected Improvement, which aims at facilitating its maximization using gradient-based ascent algorithms. Substantial computational savings are shown in application. In addition, our algorithms are tested numerically and compared to state-of-the-art UCB-based batchsequential algorithms. Combining starting designs relying on UCB with gradient-based EI local optimization finally appears as a sound option for batch design in distributed Gaussian Process optimization.
Resumo:
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The problem of discovering which content is more demanded (e.g. receive more clicks) can be modeled as a multi-armed bandit problem. Contextual bandits (i.e., bandits with covariates, side information or associative reinforcement learning) associate, to each specific content, several features that define the “context” in which it appears (e.g. user, web page, time, region). This problem can be studied in the stochastic/statistical setting by means of the conditional probability paradigm using the Bayes’ theorem. However, for very large contextual information and/or real-time constraints, the exact calculation of the Bayes’ rule is computationally infeasible. In this article, we present a method that is able to handle large contextual information for learning in contextual-bandits problems. This method was tested in the Challenge on Yahoo! dataset at ICML2012’s Workshop “new Challenges for Exploration & Exploitation 3”, obtaining the second place. Its basic exploration policy is deterministic in the sense that for the same input data (as a time-series) the same results are obtained. We address the deterministic exploration vs. exploitation issue, explaining the way in which the proposed method deterministically finds an effective dynamic trade-off based solely in the input-data, in contrast to other methods that use a random number generator.
Resumo:
A new approach to optimisation is introduced based on a precise probabilistic statement of what is ideally required of an optimisation method. It is convenient to express the formalism in terms of the control of a stationary environment. This leads to an objective function for the controller which unifies the objectives of exploration and exploitation, thereby providing a quantitative principle for managing this trade-off. This is demonstrated using a variant of the multi-armed bandit problem. This approach opens new possibilities for optimisation algorithms, particularly by using neural network or other adaptive methods for the adaptive controller. It also opens possibilities for deepening understanding of existing methods. The realisation of these possibilities requires research into practical approximations of the exact formalism.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2015
Resumo:
For a wide class of semi-Markov decision processes the optimal policies are expressible in terms of the Gittins indices, which have been found useful in sequential clinical trials and pharmaceutical research planning. In general, the indices can be approximated via calibration based on dynamic programming of finite horizon. This paper provides some results on the accuracy of such approximations, and, in particular, gives the error bounds for some well known processes (Bernoulli reward processes, normal reward processes and exponential target processes).
Resumo:
Suppose two treatments with binary responses are available for patients with some disease and that each patient will receive one of the two treatments. In this paper we consider the interests of patients both within and outside a trial using a Bayesian bandit approach and conclude that equal allocation is not appropriate for either group of patients. It is suggested that Gittins indices should be used (using an approach called dynamic discounting by choosing the discount rate based on the number of future patients in the trial) if the disease is rare, and the least failures rule if the disease is common. Some analytical and simulation results are provided.
Resumo:
We study two problems of online learning under restricted information access. In the first problem, prediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(√(N/M)TlnN ) regret on T rounds of this game. The second problem, the multiarmed bandit with paid observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cNlnN) 1/3 T2/3+√TlnN ) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.
Resumo:
We explore the use of Gittins indices to search for near optimality in sequential clinical trials. Some adaptive allocation rules are proposed to achieve the following two objectives as far as possible: (i) to reduce the expected successes lost, (ii) to minimize the error probability at the end. Simulation results indicate the merits of the rules based on Gittins indices for small trial sizes. The rules are generalized to the case when neither of the response densities is known. Asymptotic optimality is derived for the constrained rules. A simple allocation rule is recommended for one-stage models. The simulation results indicate that it works better than both equal allocation and Bather's randomized allocation. We conclude with a discussion of possible further developments.
Resumo:
When subjects must choose repeatedly between two or more alternatives, each of which dispenses reward on a probabilistic basis (two-armed bandit ), their behavior is guided by the two possible outcomes, reward and nonreward. The simplest stochastic choice rule is that the probability of choosing an alternative increases following a reward and decreases following a nonreward (reward following ). We show experimentally and theoretically that animal subjects behave as if the absolute magnitudes of the changes in choice probability caused by reward and nonreward do not depend on the response which produced the reward or nonreward (source independence ), and that the effects of reward and nonreward are in constant ratio under fixed conditions (effect-ratio invariance )--properties that fit the definition of satisficing . Our experimental results are either not predicted by, or are inconsistent with, other theories of free-operant choice such as Bush-Mosteller, molar maximization, momentary maximizing, and melioration (matching).
Resumo:
Yeast centromeric DNA (CEN DNA) binding factor 3 (CBF3) is a multisubunit protein complex that binds to the essential CDEIII element in CEN DNA. The four CBF3 proteins are required for accurate chromosome segregation and are considered to be core components of the yeast kinetochore. We have examined the structure of the CBF3–CEN DNA complex by atomic force microscopy. Assembly of CBF3–CEN DNA complexes was performed by combining purified CBF3 proteins with a DNA fragment that includes the CEN region from yeast chromosome III. Atomic force microscopy images showed DNA molecules with attached globular bodies. The contour length of the DNA containing the complex is ≈9% shorter than the DNA alone, suggesting some winding of DNA within the complex. The measured location of the single binding site indicates that the complex is located asymmetrically to the right of CDEIII extending away from CDEI and CDEII, which is consistent with previous data. The CEN DNA is bent ≈55° at the site of complex formation. A significant fraction of the complexes are linked in pairs, showing three to four DNA arms, with molecular volumes approximately three times the mean volumes of two-armed complexes. These multi-armed complexes indicate that CBF3 can bind two DNA molecules together in vitro and, thus, may be involved in holding together chromatid pairs during mitosis.