245 resultados para Prove
Resumo:
Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. A similar algorithm was proposed by Kimura, Yamamura, and Kobayashi (1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free parameter β ∈ [0,1) (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter β is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter, Bartlett, & Weaver, 2001) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward. ©2001 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
Resumo:
We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities. In a decision theoretic setting, we prove general risk bounds in terms of these complexities. We consider function classes that can be expressed as combinations of functions from basis classes and show how the Rademacher and Gaussian complexities of such a function class can be bounded in terms of the complexity of the basis classes. We give examples of the application of these techniques in finding data-dependent risk bounds for decision trees, neural networks and support vector machines.
Resumo:
Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, the notion of an optimal policy for a BMDP is not entirely straightforward. We consider two notions of optimality based on optimistic and pessimistic criteria. These have been analyzed for discounted BMDPs. Here we provide results for average reward BMDPs. We establish a fundamental relationship between the discounted and the average reward problems, prove the existence of Blackwell optimal policies and, for both notions of optimality, derive algorithms that converge to the optimal value function.
Resumo:
We consider the problem of prediction with expert advice in the setting where a forecaster is presented with several online prediction tasks. Instead of competing against the best expert separately on each task, we assume the tasks are related, and thus we expect that a few experts will perform well on the entire set of tasks. That is, our forecaster would like, on each task, to compete against the best expert chosen from a small set of experts. While we describe the “ideal” algorithm and its performance bound, we show that the computation required for this algorithm is as hard as computation of a matrix permanent. We present an efficient algorithm based on mixing priors, and prove a bound that is nearly as good for the sequential task presentation case. We also consider a harder case where the task may change arbitrarily from round to round, and we develop an efficient approximate randomized algorithm based on Markov chain Monte Carlo techniques.
Resumo:
Online learning algorithms have recently risen to prominence due to their strong theoretical guarantees and an increasing number of practical applications for large-scale data analysis problems. In this paper, we analyze a class of online learning algorithms based on fixed potentials and nonlinearized losses, which yields algorithms with implicit update rules. We show how to efficiently compute these updates, and we prove regret bounds for the algorithms. We apply our formulation to several special cases where our approach has benefits over existing online learning methods. In particular, we provide improved algorithms and bounds for the online metric learning problem, and show improved robustness for online linear prediction problems. Results over a variety of data sets demonstrate the advantages of our framework.
Resumo:
A number of learning problems can be cast as an Online Convex Game: on each round, a learner makes a prediction x from a convex set, the environment plays a loss function f, and the learner’s long-term goal is to minimize regret. Algorithms have been proposed by Zinkevich, when f is assumed to be convex, and Hazan et al., when f is assumed to be strongly convex, that have provably low regret. We consider these two settings and analyze such games from a minimax perspective, proving minimax strategies and lower bounds in each case. These results prove that the existing algorithms are essentially optimal.
Resumo:
We consider the problem of prediction with expert advice in the setting where a forecaster is presented with several online prediction tasks. Instead of competing against the best expert separately on each task, we assume the tasks are related, and thus we expect that a few experts will perform well on the entire set of tasks. That is, our forecaster would like, on each task, to compete against the best expert chosen from a small set of experts. While we describe the "ideal" algorithm and its performance bound, we show that the computation required for this algorithm is as hard as computation of a matrix permanent. We present an efficient algorithm based on mixing priors, and prove a bound that is nearly as good for the sequential task presentation case. We also consider a harder case where the task may change arbitrarily from round to round, and we develop an efficient approximate randomized algorithm based on Markov chain Monte Carlo techniques.
Resumo:
Alcohol use disorders (AUDs) are complex and developing effective treatments will require the combination of novel medications and cognitive behavioral therapy approaches. Epidemiological studies have shown there is a high correlation between alcohol consumption and tobacco use, and the prevalence of smoking in alcoholics is as high as 80% compared to about 30% for the general population. Both preclinical and clinical data provide evidence that nicotine administration increases alcohol intake and nonspecific nicotinic receptor antagonists reduce alcohol-mediated behaviors. As nicotine interacts specifically with the neuronal nicotinic acetylcholine receptor (nAChR) system, this suggests that nAChRs play an important role in the behavioral effects of alcohol. In this review, we discuss the importance of nAChRs for the treatment of AUDs and argue that the use of FDA approved nAChR ligands, such as varenicline and mecamylamine, approved as smoking cessation aids may prove to be valuable treatments for AUDs. We also address the importance of combining effective medications with behavioral therapy for the treatment of alcohol dependent individuals.
Resumo:
Background Huntingtin, the HD gene encoded protein mutated by polyglutamine expansion in Huntington's disease, is required in extraembryonic tissues for proper gastrulation, implicating its activities in nutrition or patterning of the developing embryo. To test these possibilities, we have used whole mount in situ hybridization to examine embryonic patterning and morphogenesis in homozygous Hdhex4/5 huntingtin deficient embryos. Results In the absence of huntingtin, expression of nutritive genes appears normal but E7.0–7.5 embryos exhibit a unique combination of patterning defects. Notable are a shortened primitive streak, absence of a proper node and diminished production of anterior streak derivatives. Reduced Wnt3a, Tbx6 and Dll1 expression signify decreased paraxial mesoderm and reduced Otx2 expression and lack of headfolds denote a failure of head development. In addition, genes initially broadly expressed are not properly restricted to the posterior, as evidenced by the ectopic expression of Nodal, Fgf8 and Gsc in the epiblast and T (Brachyury) and Evx1 in proximal mesoderm derivatives. Despite impaired posterior restriction and anterior streak deficits, overall anterior/posterior polarity is established. A single primitive streak forms and marker expression shows that the anterior epiblast and anterior visceral endoderm (AVE) are specified. Conclusion Huntingtin is essential in the early patterning of the embryo for formation of the anterior region of the primitive streak, and for down-regulation of a subset of dynamic growth and transcription factor genes. These findings provide fundamental starting points for identifying the novel cellular and molecular activities of huntingtin in the extraembryonic tissues that govern normal anterior streak development. This knowledge may prove to be important for understanding the mechanism by which the dominant polyglutamine expansion in huntingtin determines the loss of neurons in Huntington's disease.
Resumo:
Abstract Alcohol dependence is a disease that impacts millions of individuals worldwide. There has been some progress with pharmacotherapy for alcohol-dependent individuals; however, there remains a critical need for the development of novel and additional therapeutic approaches. Alcohol and nicotine are commonly abused together, and there is evidence that neuronal nicotinic acetylcholine receptors (nAChRs) play a role in both alcohol and nicotine dependence. Varenicline, a partial agonist at the alpha4beta2 nAChRs, reduces nicotine intake and was recently approved as a smoking cessation aid. We have investigated the role of varenicline in the modulation of ethanol consumption and seeking using three different animal models of drinking. We show that acute administration of varenicline, in doses reported to reduce nicotine reward, selectively reduced ethanol but not sucrose seeking using an operant self-administration drinking paradigm and also decreased voluntary ethanol but not water consumption in animals chronically exposed to ethanol for 2 months before varenicline treatment. Furthermore, chronic varenicline administration decreased ethanol consumption, which did not result in a rebound increase in ethanol intake when the varenicline was no longer administered. The data suggest that the alpha4beta2 nAChRs may play a role in ethanol-seeking behaviors in animals chronically exposed to ethanol. The selectivity of varenicline in decreasing ethanol consumption combined with its reported safety profile and mild side effects in humans suggest that varenicline may prove to be a treatment for alcohol dependence.
Resumo:
A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of model selection, in a general learning framework. Actually, we consider a weaker version of this condition that allows one to take into account that learning within a small model can be much easier than within a large one. Requiring this “strong margin adaptivity” makes the model selection problem more challenging. We first prove, in a general framework, that some penalization procedures (including local Rademacher complexities) exhibit this adaptivity when the models are nested. Contrary to previous results, this holds with penalties that only depend on the data. Our second main result is that strong margin adaptivity is not always possible when the models are not nested: for every model selection procedure (even a randomized one), there is a problem for which it does not demonstrate strong margin adaptivity.
Resumo:
Real-time sales assistant service is a problematic component of remote delivery of sales support for customers. Solutions involving web pages, telephony and video support prove problematic when seeking to remotely guide customers in their sales processes, especially with transactions revolving around physically complex artefacts. This process involves a number of services that are often complex in nature, ranging from physical compatibility and configuration factors, to availability and credit services. We propose the application of a combination of virtual worlds and augmented reality to create synthetic environments suitable for remote sales of physical artefacts, right in the home of the purchaser. A high level description of the service structure involved is shown, along with a use case involving the sale of electronic goods and services within an example augmented reality application. We expect this work to have application in many sales domains involving physical objects needing to be sold over the Internet.
Resumo:
The number of doctorates being awarded around the world has almost doubled over the last ten years, propelling it from a small elite enterprise into a large and ever growing international market. Within the context of increasing numbers of doctoral students this book examines the new doctorate environment and the challenges it is starting to face. Drawing on research from around the world the individual authors contribute to a previously under-represented focus of theorising the emerging practices of doctoral education and the shape of change in this arena. Key aspects, expertly discussed by contributors from the UK, USA, Australia, New Zealand, China, South Africa, Sweden and Denmark include: -the changing nature of doctoral education -the need for systematic and principled accounts of doctoral pedagogies -the importance of disciplinary specificity -the relationship between pedagogy and knowledge generation -issues of transdisciplinarity. Reshaping Doctoral Education provides rich accounts of traditional and more innovative pedagogical practices within a range of doctoral systems in different disciplines, professional fields and geographical locations, providing the reader with a trustworthy and scholarly platform from which to design the doctoral experience. It will prove an essential resource for anyone involved in doctorate studies, whether as students, supervisors, researchers, administrators, teachers or mentors.
Resumo:
This article by Ben McEniery discusses the matters a court will consider when leave to commence or proceed against a company in liquidation is sought not by a creditor seeking to prove a debt, but by the corporate regulator pursuing declaratory or injunctive relief.
Resumo:
In recent years, the problems resulting from unsustainable subdivision development have become significant problems in the Bangkok Metropolitan Region (BMR), Thailand. Numbers of government departments and agencies have tried to eliminate the problems by introducing the rating tools to encourage the higher sustainability levels of subdivision development in BMR, such as the Environmental Impact Assessment Monitoring Award (EIA-MA) and the Thai’s Rating for Energy and Environmental Sustainability of New construction and major renovation (TREES-NC). However, the EIA-MA has included the neighbourhood designs in the assessment criteria, but this requirement applies to large projects only. Meanwhile, TREES-NC has focused only on large scale buildings such as condominiums, office buildings, and is not specific for subdivision neighbourhood designs. Recently, the new rating tool named “Rating for Subdivision Neighbourhood Sustainability Design (RSNSD)” has been developed. Therefore, the validation process of RSNSD is still required. This paper aims to validate the new rating tool for subdivision neighbourhood design in BMR. The RSNSD has been validated by applying the rating tool to eight case study subdivisions. The result of RSNSD by data generated through surveying subdivisions will be compared to the existing results from the EIA-MA. The selected cases include of one “Excellent Award”, two “Very Good Award”, and five non-rated subdivision developments. This paper expects to prove the credibility of RSNSD before introducing to the real subdivision development practises. The RSNSD could be useful to encourage higher sustainability subdivision design level, and then protect the problems from further subdivision development in BMR.