971 resultados para infinite horizon


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. A similar algorithm was proposed by Kimura, Yamamura, and Kobayashi (1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free parameter β ∈ [0,1) (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter β is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter, Bartlett, & Weaver, 2001) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward. ©2001 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider a robust filtering problem for uncertain discrete-time, homogeneous, first-order, finite-state hidden Markov models (HMMs). The class of uncertain HMMs considered is described by a conditional relative entropy constraint on measures perturbed from a nominal regular conditional probability distribution given the previous posterior state distribution and the latest measurement. Under this class of perturbations, a robust infinite horizon filtering problem is first formulated as a constrained optimization problem before being transformed via variational results into an unconstrained optimization problem; the latter can be elegantly solved using a risk-sensitive information-state based filtering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Stability results are given for a class of feedback systems arising from the regulation of time-varying discrete-time systems using optimal infinite-horizon and moving-horizon feedback laws. The class is characterized by joint constraints on the state and the control, a general nonlinear cost function and nonlinear equations of motion possessing two special properties. It is shown that weak conditions on the cost function and the constraints are sufficient to guarantee uniform asymptotic stability of both the optimal infinite-horizon and movinghorizon feedback systems. The infinite-horizon cost associated with the moving-horizon feedback law approaches the optimal infinite-horizon cost as the moving horizon is extended.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies a problem of dynamic pricing faced by a retailer with limited inventory, uncertain about the demand rate model, aiming to maximize expected discounted revenue over an infinite time horizon. The retailer doubts his demand model which is generated by historical data and views it as an approximation. Uncertainty in the demand rate model is represented by a notion of generalized relative entropy process, and the robust pricing problem is formulated as a two-player zero-sum stochastic differential game. The pricing policy is obtained through the Hamilton-Jacobi-Isaacs (HJI) equation. The existence and uniqueness of the solution of the HJI equation is shown and a verification theorem is proved to show that the solution of the HJI equation is indeed the value function of the pricing problem. The results are illustrated by an example with exponential nominal demand rate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We analyze infinite-horizon choice functions within the setting of a simple linear technology. Time consistency and efficiency are characterized by stationary consumption and inheritance functions, as well as a transversality condition. In addition, we consider the equity axioms Suppes-Sen, Pigou-Dalton, and resource monotonicity. We show that Suppes-Sen and Pigou-Dalton imply that the consumption and inheritance functions are monotone with respect to time—thus justifying sustainability—while resource monotonicity implies that the consumption and inheritance functions are monotone with respect to the resource. Examples illustrate the characterization results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We analyze an infinite horizon, single product, periodic review model in which pricing and production/inventory decisions are made simultaneously. Demands in different periods are identically distributed random variables that are independent of each other and their distributions depend on the product price. Pricing and ordering decisions are made at the beginning of each period and all shortages are backlogged. Ordering cost includes both a fixed cost and a variable cost proportional to the amount ordered. The objective is to maximize expected discounted, or expected average profit over the infinite planning horizon. We show that a stationary (s,S,p) policy is optimal for both the discounted and average profit models with general demand functions. In such a policy, the period inventory is managed based on the classical (s,S) policy and price is determined based on the inventory position at the beginning of each period.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Model Predictive Control (MPC) is a control method that solves in real time an optimal control problem over a finite horizon. The finiteness of the horizon is both the reason of MPC's success and its main limitation. In operational water resources management, MPC has been in fact successfully employed for controlling systems with a relatively short memory, such as canals, where the horizon length is not an issue. For reservoirs, which have generally a longer memory, MPC applications are presently limited to short term management only. Short term reservoir management can be effectively used to deal with fast process, such as floods, but it is not capable of looking sufficiently ahead to handle long term issues, such as drought. To overcome this limitation, we propose an Infinite Horizon MPC (IH-MPC) solution that is particularly suitable for reservoir management. We propose to structure the input signal by use of orthogonal basis functions, therefore reducing the optimization argument to a finite number of variables, and making the control problem solvable in a reasonable time. We applied this solution for the management of the Manantali Reservoir. Manantali is a yearly reservoir located in Mali, on the Senegal river, affecting water systems of Mali, Senegal, and Mauritania. The long term horizon offered by IH-MPC is necessary to deal with the strongly seasonal climate of the region.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Araújo, Páscoa and Torres-Martinez (2002) have shown that, without imposing either debt constraints or transversality conditions, Ponzi schemes are ruled out in infinite horizon economies with default when collateral is the only mechanism that partially secures loans. Páscoa and Seghir (2008) subsequently show that Ponzi schemes may reappear if, additionally to the seizure of the collateral, there are sufficiently harsh default penalties assessed (directly in terms of utility) against the defaulters. They also claim that if default penalties are moderate then Ponzi schemes are ruled out and existence of a competitive equilibrium is ensured. The objective of this paper is two fold. First, contrary to what is claimed by Páscoa and Seghir (2008), we show that moderate default penalties do not always prevent agents to run a Ponzi scheme. Second, we provide an alternative condition on default penalties that is sufficient to rule out Ponzi schemes and ensure the existence of a competitive equilibrium.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Araujo, Páscoa and Torres-Martínez (2002) showed that, without imposing any debt constraint, Ponzi schemes are ruled out in infinite horizon economies with limited commitment when collateral is the only mechanism that partially secures loans. Páscoa and Seghir (2009) presented two examples in which they argued that Ponzi schemes may reappear if, additionally to the seizure of the collateral, there are sufficiently harsh default penalties assessed (directly in terms of utility) against the defaulters. Moreover, they claimed that if default penalties are moderate then Ponzi schemes are ruled out and existence of a competitive equilibrium is restored. This paper questions the validity of the claims made in Páscoa and Seghir (2009). First, we show that it is not true that harsh default penalties lead to Ponzi schemes in the examples they have proposed. A competitive equilibrium with no trade can be supported due to unduly pessimistic expectations on asset deliveries. We subsequently refine the equilibrium concept in the spirit of Dubey, Geanakoplos and Shubik (2005) in order to rule out spurious inactivity on asset markets due to irrational expectations. Our second contribution is to provide a specific example of an economy with moderate default penalties in which Ponzi schemes reappear when overpessimistic beliefs on asset deliveries are ruled out. Our finding shows that, contrary to what is claimed by Páscoa and Seghir (2009), moderate default penalties do not always prevent agents to run a Ponzi scheme.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we consider nonautonomous optimal control problems of infinite horizon type, whose control actions are given by L-1-functions. We verify that the value function is locally Lipschitz. The equivalence between dynamic programming inequalities and Hamilton-Jacobi-Bellman (HJB) inequalities for proximal sub (super) gradients is proven. Using this result we show that the value function is a Dini solution of the HJB equation. We obtain a verification result for the class of Dini sub-solutions of the HJB equation and also prove a minimax property of the value function with respect to the sets of Dini semi-solutions of the HJB equation. We introduce the concept of viscosity solutions of the HJB equation in infinite horizon and prove the equivalence between this and the concept of Dini solutions. In the Appendix we provide an existence theorem. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article presents and discusses necessary conditions of optimality for infinite horizon dynamic optimization problems with inequality state constraints and set inclusion constraints at both endpoints of the trajectory. The cost functional depends on the state variable at the final time, and the dynamics are given by a differential inclusion. Moreover, the optimization is carried out over asymptotically convergent state trajectories. The novelty of the proposed optimality conditions for this class of problems is that the boundary condition of the adjoint variable is given as a weak directional inclusion at infinity. This improves on the currently available necessary conditions of optimality for infinite horizon problems. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article presents and discusses a maximum principle for infinite horizon constrained optimal control problems with a cost functional depending on the state at the final time. The main feature of these optimality conditions is that, under reasonably weak assumptions, the multiplier is shown to satisfy a novel transversality condition at infinite time. It is also shown that these conditions can also be obtained for impulsive control problems whose dynamics are given by measure driven differential equations. © 2011 IFAC.