966 resultados para Strain-Gradient Plasticity
Resumo:
Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. A similar algorithm was proposed by Kimura, Yamamura, and Kobayashi (1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free parameter β ∈ [0,1) (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter β is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter, Bartlett, & Weaver, 2001) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward. ©2001 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
Resumo:
Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or max-margin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O(1/ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log(1/ε)) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods.
Resumo:
We consider the problem of structured classification, where the task is to predict a label y from an input x, and y has meaningful internal structure. Our framework includes supervised training of Markov random fields and weighted context-free grammars as special cases. We describe an algorithm that solves the large-margin optimization problem defined in [12], using an exponential-family (Gibbs distribution) representation of structured objects. The algorithm is efficient—even in cases where the number of labels y is exponential in size—provided that certain expectations under Gibbs distributions can be calculated efficiently. The method for structured labels relies on a more general result, specifically the application of exponentiated gradient updates [7, 8] to quadratic programs.
Resumo:
We study the rates of growth of the regret in online convex optimization. First, we show that a simple extension of the algorithm of Hazan et al eliminates the need for a priori knowledge of the lower bound on the second derivatives of the observed functions. We then provide an algorithm, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions and of Hazan et al for strongly convex functions, achieving intermediate rates between [square root T] and [log T]. Furthermore, we show strong optimality of the algorithm. Finally, we provide an extension of our results to general norms.
Resumo:
The aim of the research program was to evaluate the heat strain, hydration status, and heat illness symptoms experienced by surface mine workers. An initial investigation involved 91 surface miners completing a heat stress questionnaire; assessing the work environment, hydration practices, and heat illness symptom experience. The key findings included 1) more than 80 % of workers experienced at least one symptom of heat illness over a 12 month period; and 2) the risk of moderate symptoms of heat illness increased with the severity of dehydration. These findings highlight a health and safety concern for surface miners, as experiencing symptoms of heat illness is an indication that the physiological systems of the body may be struggling to meet the demands of thermoregulation. To illuminate these findings a field investigation to monitor the heat strain and hydration status of surface miners was proposed. Two preliminary studies were conducted to ensure accurate and reliable data collection techniques. Firstly, a study was undertaken to determine a calibration procedure to ensure the accuracy of core body temperature measurement via an ingestible sensor. A water bath was heated to several temperatures between 23 . 51 ¢ªC, allowing for comparison of the temperature recorded by the sensors and a traceable thermometer. A positive systematic bias was observed and indicated a need for calibration. It was concluded that a linear regression should be developed for each sensor prior to ingestion, allowing for a correction to be applied to the raw data. Secondly, hydration status was to be assessed through urine specific gravity measurement. It was foreseeable that practical limitations on mine sites would delay the time between urine collection and analysis. A study was undertaken to assess the reliability of urine analysis over time. Measurement of urine specific gravity was found to be reliable up to 24 hours post urine collection and was suitable to be used in the field study. Twenty-nine surface miners (14 drillers [winter] and 15 blast crew [summer]) were monitored during a normal work shift. Core body temperature was recorded continuously. Average mean core body temperature was 37.5 and 37.4 ¢ªC for blast crew and drillers, with average maximum body temperatures of 38.0 and 37.9 ¢ªC respectively. The highest body temperature recorded was 38.4 ¢ªC. Urine samples were collected at each void for specific gravity measurement. The average mean urine specific gravity was 1.024 and 1.021 for blast crew and drillers respectively. The Heat Illness Symptoms Index was used to evaluate the experience of heat illness symptoms on shift. Over 70 % of drillers and over 80 % of blast crew reported at least one symptom. It was concluded that 1) heat strain remained within the recommended limits for acclimatised workers; and 2) the majority of workers were dehydrated before commencing their shift, and tend to remain dehydrated for the duration. Dehydration was identified as the primary issue for surface miners working in the heat. Therefore continued study focused on investigating a novel approach to monitoring hydration status. The final aim of this research program was to investigate the influence dehydration has on intraocular pressure (IOP); and subsequently, whether IOP could provide a novel indicator of hydration status. Seven males completed 90 minutes of walking in both a cool and hot climate with fluid restriction. Hydration variables and intraocular pressure were measured at baseline and at 30 minute intervals. Participants became dehydrated during the trial in the heat but maintained hydration status in the cool. Intraocular pressure progressively declined in the trial in the heat but remained relatively stable when hydration was maintained. A significant relationship was observed between intraocular pressure and both body mass loss and plasma osmolality. This evidence suggests that intraocular pressure is influenced by changes in hydration status. Further research is required to determine if intraocular pressure could be utilised as an indirect indicator of hydration status.
Resumo:
Insulated rail joints (IRJs) possess lower bending stiffness across the gap containing insulating endpost and hence are subjected to wheel impact. IRJs are either square cut or inclined cut to the longitudinal axis of the rails in a vertical plane. It is generally claimed that the inclined cut IRJs outperformed the square cut IRJs; however, there is a paucity of literature with regard to the relative structural merits of these two designs. This article presents comparative studies of the structural response of these two IRJs to the passage of wheels based on continuously acquired field data from joints strain-gauged closer to the source of impact. Strain signatures are presented in time, frequency, and avelet domains and the peak vertical and shear strains are systematically employed to examine the relative structural merits of the two IRJs subjected to similar real-life loading. It is shown that the inclined IRJs resist the wheel load with higher peak shear strains and lower peak vertical strains than that of the square IRJs.