182 resultados para pressure gradient


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. A similar algorithm was proposed by Kimura, Yamamura, and Kobayashi (1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free parameter β ∈ [0,1) (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter β is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter, Bartlett, & Weaver, 2001) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward. ©2001 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or max-margin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O(1/ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log(1/ε)) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of structured classification, where the task is to predict a label y from an input x, and y has meaningful internal structure. Our framework includes supervised training of Markov random fields and weighted context-free grammars as special cases. We describe an algorithm that solves the large-margin optimization problem defined in [12], using an exponential-family (Gibbs distribution) representation of structured objects. The algorithm is efficient—even in cases where the number of labels y is exponential in size—provided that certain expectations under Gibbs distributions can be calculated efficiently. The method for structured labels relies on a more general result, specifically the application of exponentiated gradient updates [7, 8] to quadratic programs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study the rates of growth of the regret in online convex optimization. First, we show that a simple extension of the algorithm of Hazan et al eliminates the need for a priori knowledge of the lower bound on the second derivatives of the observed functions. We then provide an algorithm, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions and of Hazan et al for strongly convex functions, achieving intermediate rates between [square root T] and [log T]. Furthermore, we show strong optimality of the algorithm. Finally, we provide an extension of our results to general norms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper contributes to the recent debate about the role of referees in the home advantage phenomenon. Specifically, it aims to provide a convincing answer to the newly posed question of the existence of individual differences among referees in terms of the home advantage (Boyko, Boyko, & Boyko, 2007; Johnston, 2008). Using multilevel modelling on a large and representative dataset we find that (1) the home advantage effect differs significantly among referees, and (2) this relationship is moderated by the size of the crowd. These new results suggest that a part of the home advantage is due to the effect of the crowd on the referees, and that some referees are more prone to be influenced by the crowd than others. This provides strong evidence to indicate that referees are a significant contributing factor to the home advantage. The implications of these findings are discussed both in terms of the relevant social psychological research, and with respect to the selection, assessment, and training of referees.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: To compare the intraocular pressure readings obtained with the iCare rebound tonometer and the 7CR non-contact tonometer with those measured by Goldmann applanation tonometry in treated glaucoma patients. Design: A prospective, cross sectional study was conducted in a private tertiary glaucoma clinic. Participants: 109 (54M:55F) patients including only eyes under medical treatment for glaucoma. Methods: Measurement by Goldmann applanation tonometry, iCare rebound tonometry and 7CR non-contact tonometry. Main Outcome Measures: Intraocular pressure. Results: There were strong correlations between the intraocular pressure measurements obtained with Goldmann and both the rebound and non-contact tonometers (Spearman r values ≥ 0.79, p < 0.001). However, there were small, statistically significant differences between the average readings for each tonometer. For the rebound tonometer, the mean intraocular pressure was slightly higher compared to the Goldmann applanation tonometer in the right eyes (p = 0.02), and similar in the left eyes (p = 0.93) however these differences did not reach statistical significance. The Goldmann correlated measurements from the noncontact tonometer were lower than the average Goldmann reading for both right (p < 0.001) and left (p > 0.01) eyes. The corneal compensated measurements from the non-contact tonometer were significantly higher compared to the other tonometers (p ≤ 0.001). Conclusions: The iCare rebound tonometer and the 7CR non-contact tonometer measure IOP in fundamentally different ways to the Goldmann applanation tonometer. The resulting IOP values vary between the instruments and will need to be considered when comparing clinical versus home acquired measurements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: Adherence to Continuous Positive Airway Pressure Therapy (CPAP) for Obstructive Sleep Apnoea (OSA) is poor. We assessed the effectiveness of a motivational interviewing intervention (MINT) in addition to best practice standard care to improve acceptance and adherence to CPAP therapy in people with a new diagnosis of OSA. Method: 106 Australian adults (69% male) with a new diagnosis of obstructive sleep apnoea and clinical recommendation for CPAP treatment were recruited from a tertiary sleep disorders centre. Participants were randomly assigned to receive either three sessions of a motivational interviewing intervention ‘MINT’ (n=53; mean age=55.4 years), or no intervention ‘Control’ (n=53; mean age=57.74). The primary outcome was the difference between the groups in objective CPAP adherence at 1 month, 2 months, 3 months and 12 months follow-up. Results: Fifty (94%) participants in the MINT group and 50 (94%) of participants in the control group met all inclusion and exclusion criteria and were included in the primary analysis. The hours of CPAP use per night in the MINT group at 3 months was 4.63 hours and 3.16 hours in the control group (p=0.005). This represents almost 50% better adherence in the MINT group relative to the control group. Patients in the MINT group were substantially more likely to accept CPAP treatment. Conclusions: MINT is a brief, manualized, effective intervention which improves CPAP acceptance and objective adherence rates as compared to standard care alone.