212 resultados para Infinite horizon problems
em Indian Institute of Science - Bangalore - Índia
Resumo:
Stability results are given for a class of feedback systems arising from the regulation of time-varying discrete-time systems using optimal infinite-horizon and moving-horizon feedback laws. The class is characterized by joint constraints on the state and the control, a general nonlinear cost function and nonlinear equations of motion possessing two special properties. It is shown that weak conditions on the cost function and the constraints are sufficient to guarantee uniform asymptotic stability of both the optimal infinite-horizon and movinghorizon feedback systems. The infinite-horizon cost associated with the moving-horizon feedback law approaches the optimal infinite-horizon cost as the moving horizon is extended.
Resumo:
We study risk-sensitive control of continuous time Markov chains taking values in discrete state space. We study both finite and infinite horizon problems. In the finite horizon problem we characterize the value function via Hamilton Jacobi Bellman equation and obtain an optimal Markov control. We do the same for infinite horizon discounted cost case. In the infinite horizon average cost case we establish the existence of an optimal stationary control under certain Lyapunov condition. We also develop a policy iteration algorithm for finding an optimal control.
Resumo:
Due to their non-stationarity, finite-horizon Markov decision processes (FH-MDPs) have one probability transition matrix per stage. Thus the curse of dimensionality affects FH-MDPs more severely than infinite-horizon MDPs. We propose two parametrized 'actor-critic' algorithms to compute optimal policies for FH-MDPs. Both algorithms use the two-timescale stochastic approximation technique, thus simultaneously performing gradient search in the parametrized policy space (the 'actor') on a slower timescale and learning the policy gradient (the 'critic') via a faster recursion. This is in contrast to methods where critic recursions learn the cost-to-go proper. We show w.p 1 convergence to a set with the necessary condition for constrained optima. The proposed parameterization is for FHMDPs with compact action sets, although certain exceptions can be handled. Further, a third algorithm for stochastic control of stopping time processes is presented. We explain why current policy evaluation methods do not work as critic to the proposed actor recursion. Simulation results from flow-control in communication networks attest to the performance advantages of all three algorithms.
Resumo:
We study optimal control of Markov processes with age-dependent transition rates. The control policy is chosen continuously over time based on the state of the process and its age. We study infinite horizon discounted cost and infinite horizon average cost problems. Our approach is via the construction of an equivalent semi-Markov decision process. We characterise the value function and optimal controls for both discounted and average cost cases.
Resumo:
In public utilities, under supply constraints, fairness considerations lead to a market failure. This paper characterizes a two-period principal-agent contract for demand management, that mitigates this market failure in urban water systems. The contract is designed as an extensive form mechanism using subgame perfect Nash equilibrium (SPNE) as the solution concept. The contract is fair; and is shown to be economically efficient if, in case of deviation by the agent, the gain to the agent and the loss to the principal are small. It is shown that the assumption can be avoided in an infinite horizon contract.
Resumo:
We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.
Resumo:
This article proposes a three-timescale simulation based algorithm for solution of infinite horizon Markov Decision Processes (MDPs). We assume a finite state space and discounted cost criterion and adopt the value iteration approach. An approximation of the Dynamic Programming operator T is applied to the value function iterates. This 'approximate' operator is implemented using three timescales, the slowest of which updates the value function iterates. On the middle timescale we perform a gradient search over the feasible action set of each state using Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates, thus finding the minimizing action in T. On the fastest timescale, the 'critic' estimates, over which the gradient search is performed, are obtained. A sketch of convergence explaining the dynamics of the algorithm using associated ODEs is also presented. Numerical experiments on rate based flow control on a bottleneck node using a continuous-time queueing model are performed using the proposed algorithm. The results obtained are verified against classical value iteration where the feasible set is suitably discretized. Over such a discretized setting, a variant of the algorithm of [12] is compared and the proposed algorithm is found to converge faster.
Resumo:
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Consider a single-server multiclass queueing system with K classes where the individual queues are fed by K-correlated interrupted Poisson streams generated in the states of a K-state stationary modulating Markov chain. The service times for all the classes are drawn independently from the same distribution. There is a setup time (and/or a setup cost) incurred whenever the server switches from one queue to another. It is required to minimize the sum of discounted inventory and setup costs over an infinite horizon. We provide sufficient conditions under which exhaustive service policies are optimal. We then present some simulation results for a two-class queueing system to show that exhaustive, threshold policies outperform non-exhaustive policies.
Resumo:
We consider a stochastic differential equation (SDE) model of slotted Aloha with the retransmission probability as the associated parameter. We formulate the problem in both (a) the finite horizon and (b) the infinite horizon average cost settings. We apply the algorithm of 3] for the first setting, while for the second, we adapt a related algorithm from 2] that was originally developed in the simulation optimization framework. In the first setting, we obtain an optimal parameter trajectory that prescribes the parameter to use at any given instant while in the second setting, we obtain an optimal time-invariant parameter. Our algorithms are seen to exhibit good performance.
Resumo:
In this paper, we address a key problem faced by advertisers in sponsored search auctions on the web: how much to bid, given the bids of the other advertisers, so as to maximize individual payoffs? Assuming the generalized second price auction as the auction mechanism, we formulate this problem in the framework of an infinite horizon alternative-move game of advertiser bidding behavior. For a sponsored search auction involving two advertisers, we characterize all the pure strategy and mixed strategy Nash equilibria. We also prove that the bid prices will lead to a Nash equilibrium, if the advertisers follow a myopic best response bidding strategy. Following this, we investigate the bidding behavior of the advertisers if they use Q-learning. We discover empirically an interesting trend that the Q-values converge even if both the advertisers learn simultaneously.
Resumo:
We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov decision processes with finite state and action spaces, with a discounted reward criterion. The algorithm is of the gradient ascent type and performs a search in the space of stationary randomized policies. The algorithm uses certain simultaneous deterministic perturbation stochastic approximation (SDPSA) gradient estimates for enhanced performance. We show an application of our algorithm on a problem of mortgage refinancing. Our algorithm obtains the optimal refinancing strategies in a computationally efficient manner
Resumo:
We study zero-sum risk-sensitive stochastic differential games on the infinite horizon with discounted and ergodic payoff criteria. Under certain assumptions, we establish the existence of values and saddle-point equilibria. We obtain our results by studying the corresponding Hamilton-Jacobi-Isaacs equations. Finally, we show that the value of the ergodic payoff criterion is a constant multiple of the maximal eigenvalue of the generators of the associated nonlinear semigroups.
Resumo:
For necessary goods like water, under supply constraints, fairness considerations lead to negative externalities. The objective of this paper is to design an infinite horizon contract or relational contract (a type of long-term contract) that ensures self-enforcing (instead of court-enforced) behaviour by the agents to mitigate the externality due to fairness issues. In this contract, the consumer is induced to consume at firm-supply level using the threat of higher fair price for future time periods. The pricing mechanism, computed in this paper, internalizes the externality and is shown to be economically efficient and provides revenue sufficiency.
Resumo:
Infinite horizon discounted-cost and ergodic-cost risk-sensitive zero-sum stochastic games for controlled Markov chains with countably many states are analyzed. Upper and lower values for these games are established. The existence of value and saddle-point equilibria in the class of Markov strategies is proved for the discounted-cost game. The existence of value and saddle-point equilibria in the class of stationary strategies is proved under the uniform ergodicity condition for the ergodic-cost game. The value of the ergodic-cost game happens to be the product of the inverse of the risk-sensitivity factor and the logarithm of the common Perron-Frobenius eigenvalue of the associated controlled nonlinear kernels. (C) 2013 Elsevier B.V. All rights reserved.