745 resultados para bandit problem


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We obtain minimax lower bounds on the regret for the classicaltwo--armed bandit problem. We provide a finite--sample minimax version of the well--known log $n$ asymptotic lower bound of Lai and Robbins. Also, in contrast to the log $n$ asymptotic results on the regret, we show that the minimax regret is achieved by mere random guessing under fairly mild conditions on the set of allowable configurations of the two arms. That is, we show that for {\sl every} allocation rule and for {\sl every} $n$, there is a configuration such that the regret at time $n$ is at least 1 -- $\epsilon$ times the regret of random guessing, where $\epsilon$ is any small positive constant.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We explore the problem of budgeted machine learning, in which the learning algorithm has free access to the training examples’ labels but has to pay for each attribute that is specified. This learning model is appropriate in many areas, including medical applications. We present new algorithms for choosing which attributes to purchase of which examples in the budgeted learning model based on algorithms for the multi-armed bandit problem. All of our approaches outperformed the current state of the art. Furthermore, we present a new means for selecting an example to purchase after the attribute is selected, instead of selecting an example uniformly at random, which is typically done. Our new example selection method improved performance of all the algorithms we tested, both ours and those in the literature.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We develop a mathematical programming approach for the classicalPSPACE - hard restless bandit problem in stochastic optimization.We introduce a hierarchy of n (where n is the number of bandits)increasingly stronger linear programming relaxations, the lastof which is exact and corresponds to the (exponential size)formulation of the problem as a Markov decision chain, while theother relaxations provide bounds and are efficiently computed. Wealso propose a priority-index heuristic scheduling policy fromthe solution to the first-order relaxation, where the indices aredefined in terms of optimal dual variables. In this way wepropose a policy and a suboptimality guarantee. We report resultsof computational experiments that suggest that the proposedheuristic policy is nearly optimal. Moreover, the second-orderrelaxation is found to provide strong bounds on the optimalvalue.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work deals with parallel optimization of expensive objective functions which are modelled as sample realizations of Gaussian processes. The study is formalized as a Bayesian optimization problem, or continuous multi-armed bandit problem, where a batch of q > 0 arms is pulled in parallel at each iteration. Several algorithms have been developed for choosing batches by trading off exploitation and exploration. As of today, the maximum Expected Improvement (EI) and Upper Confidence Bound (UCB) selection rules appear as the most prominent approaches for batch selection. Here, we build upon recent work on the multipoint Expected Improvement criterion, for which an analytic expansion relying on Tallis’ formula was recently established. The computational burden of this selection rule being still an issue in application, we derive a closed-form expression for the gradient of the multipoint Expected Improvement, which aims at facilitating its maximization using gradient-based ascent algorithms. Substantial computational savings are shown in application. In addition, our algorithms are tested numerically and compared to state-of-the-art UCB-based batchsequential algorithms. Combining starting designs relying on UCB with gradient-based EI local optimization finally appears as a sound option for batch design in distributed Gaussian Process optimization.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Machine and Statistical Learning techniques are used in almost all online advertisement systems. The problem of discovering which content is more demanded (e.g. receive more clicks) can be modeled as a multi-armed bandit problem. Contextual bandits (i.e., bandits with covariates, side information or associative reinforcement learning) associate, to each specific content, several features that define the “context” in which it appears (e.g. user, web page, time, region). This problem can be studied in the stochastic/statistical setting by means of the conditional probability paradigm using the Bayes’ theorem. However, for very large contextual information and/or real-time constraints, the exact calculation of the Bayes’ rule is computationally infeasible. In this article, we present a method that is able to handle large contextual information for learning in contextual-bandits problems. This method was tested in the Challenge on Yahoo! dataset at ICML2012’s Workshop “new Challenges for Exploration & Exploitation 3”, obtaining the second place. Its basic exploration policy is deterministic in the sense that for the same input data (as a time-series) the same results are obtained. We address the deterministic exploration vs. exploitation issue, explaining the way in which the proposed method deterministically finds an effective dynamic trade-off based solely in the input-data, in contrast to other methods that use a random number generator.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A new approach to optimisation is introduced based on a precise probabilistic statement of what is ideally required of an optimisation method. It is convenient to express the formalism in terms of the control of a stationary environment. This leads to an objective function for the controller which unifies the objectives of exploration and exploitation, thereby providing a quantitative principle for managing this trade-off. This is demonstrated using a variant of the multi-armed bandit problem. This approach opens new possibilities for optimisation algorithms, particularly by using neural network or other adaptive methods for the adaptive controller. It also opens possibilities for deepening understanding of existing methods. The realisation of these possibilities requires research into practical approximations of the exact formalism.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a polyhedral framework for establishing general structural properties on optimal solutions of stochastic scheduling problems, where multiple job classes vie for service resources: the existence of an optimal priority policy in a given family, characterized by a greedoid (whose feasible class subsets may receive higher priority), where optimal priorities are determined by class-ranking indices, under restricted linear performance objectives (partial indexability). This framework extends that of Bertsimas and Niño-Mora (1996), which explained the optimality of priority-index policies under all linear objectives (general indexability). We show that, if performance measures satisfy partial conservation laws (with respect to the greedoid), which extend previous generalized conservation laws, then the problem admits a strong LP relaxation over a so-called extended greedoid polytope, which has strong structural and algorithmic properties. We present an adaptive-greedy algorithm (which extends Klimov's) taking as input the linear objective coefficients, which (1) determines whether the optimal LP solution is achievable by a policy in the given family; and (2) if so, computes a set of class-ranking indices that characterize optimal priority policies in the family. In the special case of project scheduling, we show that, under additional conditions, the optimal indices can be computed separately for each project (index decomposition). We further apply the framework to the important restless bandit model (two-action Markov decision chains), obtaining new index policies, that extend Whittle's (1988), and simple sufficient conditions for their validity. These results highlight the power of polyhedral methods (the so-called achievable region approach) in dynamic and stochastic optimization.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract In this paper, we address the problem of picking a subset of bids in a general combinatorial auction so as to maximize the overall profit using the first-price model. This winner determination problem assumes that a single bidding round is held to determine both the winners and prices to be paid. We introduce six variants of biased random-key genetic algorithms for this problem. Three of them use a novel initialization technique that makes use of solutions of intermediate linear programming relaxations of an exact mixed integer-linear programming model as initial chromosomes of the population. An experimental evaluation compares the effectiveness of the proposed algorithms with the standard mixed linear integer programming formulation, a specialized exact algorithm, and the best-performing heuristics proposed for this problem. The proposed algorithms are competitive and offer strong results, mainly for large-scale auctions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ecological science contributes to solving a broad range of environmental problems. However, lack of ecological literacy in practice often limits application of this knowledge. In this paper, we highlight a critical but often overlooked demand on ecological literacy: to enable professionals of various careers to apply scientific knowledge when faced with environmental problems. Current university courses on ecology often fail to persuade students that ecological science provides important tools for environmental problem solving. We propose problem-based learning to improve the understanding of ecological science and its usefulness for real-world environmental issues that professionals in careers as diverse as engineering, public health, architecture, social sciences, or management will address. Courses should set clear learning objectives for cognitive skills they expect students to acquire. Thus, professionals in different fields will be enabled to improve environmental decision-making processes and to participate effectively in multidisciplinary work groups charged with tackling environmental issues.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the capacitated lot sizing problem (CLSP) with a single stage composed of multiple plants, items and periods with setup carry-over among the periods. The CLSP is well studied and many heuristics have been proposed to solve it. Nevertheless, few researches explored the multi-plant capacitated lot sizing problem (MPCLSP), which means that few solution methods were proposed to solve it. Furthermore, to our knowledge, no study of the MPCLSP with setup carry-over was found in the literature. This paper presents a mathematical model and a GRASP (Greedy Randomized Adaptive Search Procedure) with path relinking to the MPCLSP with setup carry-over. This solution method is an extension and adaptation of a previously adopted methodology without the setup carry-over. Computational tests showed that the improvement of the setup carry-over is significant in terms of the solution value with a low increase in computational time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Introduction: Work disability is a major consequence of rheumatoid arthritis (RA), associated not only with traditional disease activity variables, but also more significantly with demographic, functional, occupational, and societal variables. Recent reports suggest that the use of biologic agents offers potential for reduced work disability rates, but the conclusions are based on surrogate disease activity measures derived from studies primarily from Western countries. Methods: The Quantitative Standard Monitoring of Patients with RA (QUEST-RA) multinational database of 8,039 patients in 86 sites in 32 countries, 16 with high gross domestic product (GDP) (>24K US dollars (USD) per capita) and 16 low-GDP countries (<11K USD), was analyzed for work and disability status at onset and over the course of RA and clinical status of patients who continued working or had stopped working in high-GDP versus low-GDP countries according to all RA Core Data Set measures. Associations of work disability status with RA Core Data Set variables and indices were analyzed using descriptive statistics and regression analyses. Results: At the time of first symptoms, 86% of men (range 57%-100% among countries) and 64% (19%-87%) of women <65 years were working. More than one third (37%) of these patients reported subsequent work disability because of RA. Among 1,756 patients whose symptoms had begun during the 2000s, the probabilities of continuing to work were 80% (95% confidence interval (CI) 78%-82%) at 2 years and 68% (95% CI 65%-71%) at 5 years, with similar patterns in high-GDP and low-GDP countries. Patients who continued working versus stopped working had significantly better clinical status for all clinical status measures and patient self-report scores, with similar patterns in high-GDP and low-GDP countries. However, patients who had stopped working in high-GDP countries had better clinical status than patients who continued working in low-GDP countries. The most significant identifier of work disability in all subgroups was Health Assessment Questionnaire (HAQ) functional disability score. Conclusions: Work disability rates remain high among people with RA during this millennium. In low-GDP countries, people remain working with high levels of disability and disease activity. Cultural and economic differences between societies affect work disability as an outcome measure for RA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aims. An analytical solution for the discrepancy between observed core-like profiles and predicted cusp profiles in dark matter halos is studied. Methods. We calculate the distribution function for Navarro-Frenk-White halos and extract energy from the distribution, taking into account the effects of baryonic physics processes. Results. We show with a simple argument that we can reproduce the evolution of a cusp to a flat density profile by a decrease of the initial potential energy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The energy spectrum of an electron confined in a quantum dot (QD) with a three-dimensional anisotropic parabolic potential in a tilted magnetic field was found analytically. The theory describes exactly the mixing of in-plane and out-of-plane motions of an electron caused by a tilted magnetic field, which could be seen, for example, in the level anticrossing. For charged QDs in a tilted magnetic field we predict three strong resonant lines in the far-infrared-absorption spectra.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The width of a closed convex subset of n-dimensional Euclidean space is the distance between two parallel supporting hyperplanes. The Blaschke-Lebesgue problem consists of minimizing the volume in the class of convex sets of fixed constant width and is still open in dimension n >= 3. In this paper we describe a necessary condition that the minimizer of the Blaschke-Lebesgue must satisfy in dimension n = 3: we prove that the smooth components of the boundary of the minimizer have their smaller principal curvature constant and therefore are either spherical caps or pieces of tubes (canal surfaces).