976 resultados para Value function
Resumo:
The paper develops a method to solve higher-dimensional stochasticcontrol problems in continuous time. A finite difference typeapproximation scheme is used on a coarse grid of low discrepancypoints, while the value function at intermediate points is obtainedby regression. The stability properties of the method are discussed,and applications are given to test problems of up to 10 dimensions.Accurate solutions to these problems can be obtained on a personalcomputer.
Resumo:
Researchers often rely on the t-statistic to make inference on parameters in statistical models. It is common practice to obtain critical values by simulation techniques. This paper proposes a novel numerical method to obtain an approximately similar test. This test rejects the null hypothesis when the test statistic islarger than a critical value function (CVF) of the data. We illustrate this procedure when regressors are highly persistent, a case in which commonly-used simulation methods encounter dificulties controlling size uniformly. Our approach works satisfactorily, controls size, and yields a test which outperforms the two other known similar tests.
Resumo:
In this paper we consider nonautonomous optimal control problems of infinite horizon type, whose control actions are given by L-1-functions. We verify that the value function is locally Lipschitz. The equivalence between dynamic programming inequalities and Hamilton-Jacobi-Bellman (HJB) inequalities for proximal sub (super) gradients is proven. Using this result we show that the value function is a Dini solution of the HJB equation. We obtain a verification result for the class of Dini sub-solutions of the HJB equation and also prove a minimax property of the value function with respect to the sets of Dini semi-solutions of the HJB equation. We introduce the concept of viscosity solutions of the HJB equation in infinite horizon and prove the equivalence between this and the concept of Dini solutions. In the Appendix we provide an existence theorem. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
This paper contributes with a unified formulation that merges previ- ous analysis on the prediction of the performance ( value function ) of certain sequence of actions ( policy ) when an agent operates a Markov decision process with large state-space. When the states are represented by features and the value function is linearly approxi- mated, our analysis reveals a new relationship between two common cost functions used to obtain the optimal approximation. In addition, this analysis allows us to propose an efficient adaptive algorithm that provides an unbiased linear estimate. The performance of the pro- posed algorithm is illustrated by simulation, showing competitive results when compared with the state-of-the-art solutions.
Resumo:
The paper develops a stability theory for the optimal value and the optimal set mapping of optimization problems posed in a Banach space. The problems considered in this paper have an arbitrary number of inequality constraints involving lower semicontinuous (not necessarily convex) functions and one closed abstract constraint set. The considered perturbations lead to problems of the same type as the nominal one (with the same space of variables and the same number of constraints), where the abstract constraint set can also be perturbed. The spaces of functions involved in the problems (objective and constraints) are equipped with the metric of the uniform convergence on the bounded sets, meanwhile in the space of closed sets we consider, coherently, the Attouch-Wets topology. The paper examines, in a unified way, the lower and upper semicontinuity of the optimal value function, and the closedness, lower and upper semicontinuity (in the sense of Berge) of the optimal set mapping. This paper can be seen as a second part of the stability theory presented in [17], where we studied the stability of the feasible set mapping (completed here with the analysis of the Lipschitz-like property).
Resumo:
This paper deals with the expected discounted continuous control of piecewise deterministic Markov processes (PDMP`s) using a singular perturbation approach for dealing with rapidly oscillating parameters. The state space of the PDMP is written as the product of a finite set and a subset of the Euclidean space a""e (n) . The discrete part of the state, called the regime, characterizes the mode of operation of the physical system under consideration, and is supposed to have a fast (associated to a small parameter epsilon > 0) and a slow behavior. By using a similar approach as developed in Yin and Zhang (Continuous-Time Markov Chains and Applications: A Singular Perturbation Approach, Applications of Mathematics, vol. 37, Springer, New York, 1998, Chaps. 1 and 3) the idea in this paper is to reduce the number of regimes by considering an averaged model in which the regimes within the same class are aggregated through the quasi-stationary distribution so that the different states in this class are replaced by a single one. The main goal is to show that the value function of the control problem for the system driven by the perturbed Markov chain converges to the value function of this limit control problem as epsilon goes to zero. This convergence is obtained by, roughly speaking, showing that the infimum and supremum limits of the value functions satisfy two optimality inequalities as epsilon goes to zero. This enables us to show the result by invoking a uniqueness argument, without needing any kind of Lipschitz continuity condition.
Resumo:
This paper presents a programable perturbation and observation control implementation for a wind generation system and its power electronic converter. The objective of the method in this particular application is to adjust the power delivered to charge a battery to its maximum and allowable value, function of the real values of several parameters and their continuous variation, the most important the wind velocity and the turbine efficiency. Also, to improve the power throughput and to use the turbine and generator marginal zones of operation, an unusual power converter is used, allowing a wide range for the input voltage values. The implemented control is continuously measuring the actual power and looks for a new and powerful operation point. © 2014 IEEE.
Resumo:
This paper presents a programable perturbation and observation control implementation for a wind generation system and its power electronic converter. The objective of the method in this particular application is to adjust the power delivered to charge a battery to its maximum and allowable value, function of the real values of several parameters and their continuous variation, the most important the wind velocity and the turbine efficiency. Also, to improve the power throughput and to use the turbine and generator marginal zones of operation, an unusual power converter is used, allowing a wide range for the input voltage values. The implemented control is continuously measuring the actual power and looks for a new and powerful operation point. © 2014 IEEE.
Resumo:
O outsourcing tem aumentado exponencialmente nos últimos anos, o que se deve ao aumento da especialização das organizações. Como consequência, a responsabilidade do departamento de compras também aumenta. O mesmo será dizer que cada vez mais as organizações se tornam dependentes dos seus fornecedores, necessitando por isso de recorrer a metodologias que lhes permitam monitorizar/avaliar fornecedores capazes de, em conjunto, criar valor e reduzir custos. O propósito deste trabalho é analisar e desenvolver um procedimento de Monitorização e Avaliação de Fornecedores que permita avaliar o desempenho dos fornecedores, identificando aqueles que são capazes de acompanhar o desenvolvimento e a sustentabilidade da organização.
Resumo:
We study the existence theory for parabolic variational inequalities in weighted L2 spaces with respect to excessive measures associated with a transition semigroup. We characterize the value function of optimal stopping problems for finite and infinite dimensional diffusions as a generalized solution of such a variational inequality. The weighted L2 setting allows us to cover some singular cases, such as optimal stopping for stochastic equations with degenerate diffusion coeficient. As an application of the theory, we consider the pricing of American-style contingent claims. Among others, we treat the cases of assets with stochastic volatility and with path-dependent payoffs.
Resumo:
This paper suggests a simple method based on Chebyshev approximation at Chebyshev nodes to approximate partial differential equations. The methodology simply consists in determining the value function by using a set of nodes and basis functions. We provide two examples. Pricing an European option and determining the best policy for chatting down a machinery. The suggested method is flexible, easy to program and efficient. It is also applicable in other fields, providing efficient solutions to complex systems of partial differential equations.
Resumo:
Time-inconsistency is an essential feature of many policy problems (Kydland and Prescott, 1977). This paper presents and compares three methods for computing Markov-perfect optimal policies in stochastic nonlinear business cycle models. The methods considered include value function iteration, generalized Euler-equations, and parameterized shadow prices. In the context of a business cycle model in which a scal authority chooses government spending and income taxation optimally, while lacking the ability to commit, we show that the solutions obtained using value function iteration and generalized Euler equations are somewhat more accurate than that obtained using parameterized shadow prices. Among these three methods, we show that value function iteration can be applied easily, even to environments that include a risk-sensitive scal authority and/or inequality constraints on government spending. We show that the risk-sensitive scal authority lowers government spending and income-taxation, reducing the disincentive households face to accumulate wealth.
Resumo:
This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task
Resumo:
Plan recognition is the problem of inferring the goals and plans of an agent from partial observations of her behavior. Recently, it has been shown that the problem can be formulated and solved usingplanners, reducing plan recognition to plan generation.In this work, we extend this model-basedapproach to plan recognition to the POMDP setting, where actions are stochastic and states are partially observable. The task is to infer a probability distribution over the possible goals of an agent whose behavior results from a POMDP model. The POMDP model is shared between agent and observer except for the true goal of the agent that is hidden to the observer. The observations are action sequences O that may contain gaps as some or even most of the actions done by the agent may not be observed. We show that the posterior goal distribution P(GjO) can be computed from the value function VG(b) over beliefs b generated by the POMDPplanner for each possible goal G. Some extensionsof the basic framework are discussed, and a numberof experiments are reported.
Resumo:
A model of directed search with a finite number of buyers and sellers is considered, where sellers compete in direct mechanisms. Buyer heterogeneity and Nash equilibrium results in perfect sorting. The restriction to complementary inputs, that the match value function Q is supermodular, in addition coordinates the sellers strategies. In that case, equilibrium implements positive assortative matching, which is efficient and consistent with the stable (cooperative equilibrium) outcome. This provides a non-cooperative and decentralizedsolution for the Assignment Game. Conversely, if buyers are identical, no such coordination is possible, and there is a continuum of equilibria, one of which exhibits price posting, another yields competition in auctions.