Biblioteca Digital

987 resultados para Policy parameters

Bounds on policy relevant parameters with discrete policy variation

Relevância:

80.00% 80.00%

Publicador:

Resumo:

When estimating policy parameters, also known as treatment effects, the assignment to treatment mechanism almost always causes endogeneity and thus bias many of these policy parameters estimates. Additionally, heterogeneity in program impacts is more likely to be the norm than the exception for most social programs. In situations where these issues are present, the Marginal Treatment Effect (MTE) parameter estimation makes use of an instrument to avoid assignment bias and simultaneously to account for heterogeneous effects throughout individuals. Although this parameter is point identified in the literature, the assumptions required for identification may be strong. Given that, we use weaker assumptions in order to partially identify the MTE, i.e. to stablish a methodology for MTE bounds estimation, implementing it computationally and showing results from Monte Carlo simulations. The partial identification we perfom requires the MTE to be a monotone function over the propensity score, which is a reasonable assumption on several economics' examples, and the simulation results shows it is possible to get informative even in restricted cases where point identification is lost. Additionally, in situations where estimated bounds are not informative and the traditional point identification is lost, we suggest a more generic method to point estimate MTE using the Moore-Penrose Pseudo-Invese Matrix, achieving better results than traditional methods.

Infinite-horizon policy-gradient estimation

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. A similar algorithm was proposed by Kimura, Yamamura, and Kobayashi (1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free parameter β ∈ [0,1) (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter β is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter, Bartlett, & Weaver, 2001) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward. ©2001 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.

Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This article presents a novel algorithm for learning parameters in statistical dialogue systems which are modeled as Partially Observable Markov Decision Processes (POMDPs). The three main components of a POMDP dialogue manager are a dialogue model representing dialogue state information; a policy that selects the system's responses based on the inferred state; and a reward function that specifies the desired behavior of the system. Ideally both the model parameters and the policy would be designed to maximize the cumulative reward. However, while there are many techniques available for learning the optimal policy, no good ways of learning the optimal model parameters that scale to real-world dialogue systems have been found yet. The presented algorithm, called the Natural Actor and Belief Critic (NABC), is a policy gradient method that offers a solution to this problem. Based on observed rewards, the algorithm estimates the natural gradient of the expected cumulative reward. The resulting gradient is then used to adapt both the prior distribution of the dialogue model parameters and the policy parameters. In addition, the article presents a variant of the NABC algorithm, called the Natural Belief Critic (NBC), which assumes that the policy is fixed and only the model parameters need to be estimated. The algorithms are evaluated on a spoken dialogue system in the tourist information domain. The experiments show that model parameters estimated to maximize the expected cumulative reward result in significantly improved performance compared to the baseline hand-crafted model parameters. The algorithms are also compared to optimization techniques using plain gradients and state-of-the-art random search algorithms. In all cases, the algorithms based on the natural gradient work significantly better. © 2011 ACM.

Natural actor-critic algorithms

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and functi approximation ideas,and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function-approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.

An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.

Incremental natural-gradient actor-critic algorithms

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic rein- forcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their com- patibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further re- duce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal differ- ence learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.

Reinforcement learning for parameter estimation in statistical spoken dialogue systems

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters. © 2011 Elsevier Ltd. All rights reserved.

La seguridad humana canadiense y su aplicación en la construcción de paz en Sierra Leona

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Canadá actuó en la construcción del post-conflicto en Sierra Leona como respuesta a los parámetros de su política exterior y basándose específicamente en los principios que se desprenden de la seguridad humana. En este contexto Canadá se desempeñó como constructor de paz y participo de manera activa a lo largo de todas las etapas del post-conflicto en Sierra Leona. Esta participación fue un poco tímida e indirecta en el marco de los procesos de desarme y desmovilización, mientras que en los de reinserción y reintegración desempeñó un papel determinante.

La ausencia de Políticas Públicas sobre combustibles de origen biológico en Colombia y sus efectos sobre el medio ambiente

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El cambio climático ha sido una de las mayores preocupaciones de los Estados en las últimas décadas como resultado de más de un siglo de explotación y empleo de un recurso natural no renovable que fue el motor del mundo en el siglo pasado y aún hoy mantiene a flote la economía de los países, el petróleo. En la actualidad, las reservas de petróleo han empezando a escasear. Por tanto, se han empezado a desarrollar estrategias para evitar una crisis energética global. La más importante la constituye la investigación y puesta en práctica de energías renovables y ambientalmente amigables como los biocombustibles. Los biocombustibles se perciben como una solución energética para el país. Sin embargo, la falta de articulación en la gestión de los actores de gobierno encargados de su puesta en práctica y de parámetros normativos fragmentarios al respecto, ocasiona una ausencia de políticas públicas de biocombustibles en Colombia; generando así que los compromisos ambientales y de reducción de emisiones de gas carbónico sean hoy, una meta por conquistar.

European Union: Shadow WTO agricultural domestic support notifications

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The notification of the level of domestic support to the World Trade Organization (WTO) is intended to reflect compliance with obligations entered into at the time of the Uruguay Round. WTO members have often been slow to provide notification of domestic support levels. This makes the process of notification less useful as an indicator of the degree to which changes in policy have or have not benefited the trade system as a whole and exporting countries in particular. The notification of domestic support in the E.U. illustrates the value of a measure that reflects current policies and can therefore act as a basis for negotiation of further disciplines where these are necessary. The E.U. has made major changes in its Common Agricultural Policy (CAP) over the period since 1992 when the MacSharry reforms were implemented. Payments originally notified in the blue box (related to supply control) have over time been changed until in their present form they are unrelated to current production or price levels, and hence can satisfy the criteria for the green box. The E.U. has therefore much more latitude in trade talks to agree to reductions in the allowable trade-distorting support. This paper reproduced the E.U. notifications relating to 2003/04 and extends these with official statistics to the year 2006/07. It then projects forward the components of domestic support until the year 2013/14, based on forecasts of future production and estimates of policy parameters. The impact of a successful Doha Round is simulated, showing that the constraints envisaged in the WTO draft modalities document of May 19, 2008, would be binding by the year 2013, at about the time the next budget cycle in the E.U. starts. Without the Doha Round constraints, further reform might still happen for domestic reasons, but the framework provided by the WTO for domestic policy spending would be less relevant. In that case, much could hinge on the legitimacy of the Single Farm Payment system under the current rules governing the green box.

The state of the youth: prisons, drugs and car crashes

Relevância:

60.00% 60.00%

Publicador:

Resumo:

By virtue of the volume and nature of their attributions, including secondary school as well as problem-areas such as security and traffic, the Brazilian states are the ultimate responsible entities for young people. This study argues in favour of granting greater freedom for the states to define their own public policy parameters to deal with local features and to increase the degree of learning about such actions at the national level. In empirical terms, the study assesses the impacts of new laws, such as the new traffic code (from the joint work with Leandro Kume, EPGE/FGV doctor’s degree student) and traces the statistics for specific questions like drugs, violence and car accidents. The findings show that these questions produce different results for young men and women.The main characters in these dramas are young single males, suggesting the need for more distinguished public policies according not only to age, but also by gender. The study also reveals that the magnitude of these problems changes according to the youth’s social class. Prisons concern poorer men (except for the functional illiterate) while fatal car accidents and the confessed use of drugs concern upper-class boys.

Measuring Unemployment Insurance Generosity

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we develop a methodology to summarize the various policy parameters of an unemployment insurance scheme into a single generosity parameter. Unemployment insurance policies are multdimensional objects. They are typically defined by waiting periods, eligibility duration, benefit levels and asset tests when eligible, which makes intertemporal or international comparisons difficult. To make things worse, labor market conditions, such as the likelihood and duration of unemployment matter when assessing the generosity of different policies. We build a first model with such complex characteristics. Our model features heterogeneous agents that are liquidity constrained but can self-insure. We then build a second model that is similar, except that the unemployment insurance is simpler: it is deprived of waiting periods and agents are eligible forever with constant benefits. We then determine which level of benefits in this second model makes agents indifferent between both unemployment insurance policies. We apply this strategy to the unemployment insurance program of the United Kingdom and study how its generosity evolved over time.

Nonparametric Identication and Structural Estimation of Auction Models

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

A review of the economic consequences of a policy of universal leucodepletion as compared to existing practices

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Leucodepletion, the removal of leucocytes from blood products improves the safety of blood transfusion by reducing adverse events associated with the incidental non-therapeutic transfusion of leucocytes. Leucodepletion has been shown to have clinical benefit for immuno-suppressed patients who require transfusion. The selective leucodepletion of blood products by bed side filtration for these patients has been widely practiced. This study investigated the economic consequences in Queensland of moving from a policy of selective leucodepletion to one of universal leucodepletion, that is providing all transfused patients with blood products leucodepleted during the manufacturing process. Using an analytic decision model a cost-effectiveness analysis was conducted. An ICER of $16.3M per life year gained was derived. Sensitivity analysis found this result to be robust to uncertainty in the parameters used in the model. This result argues against moving to a policy of universal leucodepletion. However during the course of the study the policy decision for universal leucodepletion was made and implemented in Queensland in October 2008. This study has concluded that cost-effectiveness is not an influential factor in policy decisions regarding quality and safety initiatives in the Australian blood sector.

Setting hospital infection control policy : a decision-making framework incorporating health economics and healthcare epidemiology

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Reducing rates of healthcare acquired infection has been identified by the Australian Commission on Safety and Quality in Health Care as a national priority. One of the goals is the prevention of central venous catheter-related bloodstream infection (CR-BSI). At least 3,500 cases of CR-BSI occur annually in Australian hospitals, resulting in unnecessary deaths and costs to the healthcare system between $25.7 and $95.3 million. Two approaches to preventing these infections have been proposed: use of antimicrobial catheters (A-CVCs); or a catheter care and management ‘bundle’. Given finite healthcare budgets, decisions about the optimal infection control policy require consideration of the effectiveness and value for money of each approach. Objectives: The aim of this research is to use a rational economic framework to inform efficient infection control policy relating to the prevention of CR-BSI in the intensive care unit. It addresses three questions relating to decision-making in this area: 1. Is additional investment in activities aimed at preventing CR-BSI an efficient use of healthcare resources? 2. What is the optimal infection control strategy from amongst the two major approaches that have been proposed to prevent CR-BSI? 3. What uncertainty is there in this decision and can a research agenda to improve decision-making in this area be identified? Methods: A decision analytic model-based economic evaluation was undertaken to identify an efficient approach to preventing CR-BSI in Queensland Health intensive care units. A Markov model was developed in conjunction with a panel of clinical experts which described the epidemiology and prognosis of CR-BSI. The model was parameterised using data systematically identified from the published literature and extracted from routine databases. The quality of data used in the model and its validity to clinical experts and sensitivity to modelling assumptions was assessed. Two separate economic evaluations were conducted. The first evaluation compared all commercially available A-CVCs alongside uncoated catheters to identify which was cost-effective for routine use. The uncertainty in this decision was estimated along with the value of collecting further information to inform the decision. The second evaluation compared the use of A-CVCs to a catheter care bundle. We were unable to estimate the cost of the bundle because it is unclear what the full resource requirements are for its implementation, and what the value of these would be in an Australian context. As such we undertook a threshold analysis to identify the cost and effectiveness thresholds at which a hypothetical bundle would dominate the use of A-CVCs under various clinical scenarios. Results: In the first evaluation of A-CVCs, the findings from the baseline analysis, in which uncertainty is not considered, show that the use of any of the four A-CVCs will result in health gains accompanied by cost-savings. The MR catheters dominate the baseline analysis generating 1.64 QALYs and cost-savings of $130,289 per 1.000 catheters. With uncertainty, and based on current information, the MR catheters remain the optimal decision and return the highest average net monetary benefits ($948 per catheter) relative to all other catheter types. This conclusion was robust to all scenarios tested, however, the probability of error in this conclusion is high, 62% in the baseline scenario. Using a value of $40,000 per QALY, the expected value of perfect information associated with this decision is $7.3 million. An analysis of the expected value of perfect information for individual parameters suggests that it may be worthwhile for future research to focus on providing better estimates of the mortality attributable to CR-BSI and the effectiveness of both SPC and CH/SSD (int/ext) catheters. In the second evaluation of the catheter care bundle relative to A-CVCs, the results which do not consider uncertainty indicate that a bundle must achieve a relative risk of CR-BSI of at least 0.45 to be cost-effective relative to MR catheters. If the bundle can reduce rates of infection from 2.5% to effectively zero, it is cost-effective relative to MR catheters if national implementation costs are less than $2.6 million ($56,610 per ICU). If the bundle can achieve a relative risk of 0.34 (comparable to that reported in the literature) it is cost-effective, relative to MR catheters, if costs over an 18 month period are below $613,795 nationally ($13,343 per ICU). Once uncertainty in the decision is considered, the cost threshold for the bundle increases to $2.2 million. Therefore, if each of the 46 Level III ICUs could implement an 18 month catheter care bundle for less than $47,826 each, this approach would be cost effective relative to A-CVCs. However, the uncertainty is substantial and the probability of error in concluding that the bundle is the cost-effective approach at a cost of $2.2 million is 89%. Conclusions: This work highlights that infection control to prevent CR-BSI is an efficient use of healthcare resources in the Australian context. If there is no further investment in infection control, an opportunity cost is incurred, which is the potential for a more efficient healthcare system. Minocycline/rifampicin catheters are the optimal choice of antimicrobial catheter for routine use in Australian Level III ICUs, however, if a catheter care bundle implemented in Australia was as effective as those used in the large studies in the United States it would be preferred over the catheters if it was able to be implemented for less than $47,826 per Level III ICU. Uncertainty is very high in this decision and arises from multiple sources. There are likely greater costs to this uncertainty for A-CVCs, which may carry hidden costs, than there are for a catheter care bundle, which is more likely to provide indirect benefits to clinical practice and patient safety. Research into the mortality attributable to CR-BSI, the effectiveness of SPC and CH/SSD (int/ext) catheters and the cost and effectiveness of a catheter care bundle in Australia should be prioritised to reduce uncertainty in this decision. This thesis provides the economic evidence to inform one area of infection control, but there are many other infection control decisions for which information about the cost-effectiveness of competing interventions does not exist. This work highlights some of the challenges and benefits to generating and using economic evidence for infection control decision-making and provides support for commissioning more research into the cost-effectiveness of infection control.

«
1
2
3
4
5
6
7
8
...
65
66
»