3 resultados para Optimal Stochastic Control

em Massachusetts Institute of Technology


Relevância:

40.00% 40.00%

Publicador:

Resumo:

We analyze a finite horizon, single product, periodic review model in which pricing and production/inventory decisions are made simultaneously. Demands in different periods are random variables that are independent of each other and their distributions depend on the product price. Pricing and ordering decisions are made at the beginning of each period and all shortages are backlogged. Ordering cost includes both a fixed cost and a variable cost proportional to the amount ordered. The objective is to find an inventory policy and a pricing strategy maximizing expected profit over the finite horizon. We show that when the demand model is additive, the profit-to-go functions are k-concave and hence an (s,S,p) policy is optimal. In such a policy, the period inventory is managed based on the classical (s,S) policy and price is determined based on the inventory position at the beginning of each period. For more general demand functions, i.e., multiplicative plus additive functions, we demonstrate that the profit-to-go function is not necessarily k-concave and an (s,S,p) policy is not necessarily optimal. We introduce a new concept, the symmetric k-concave functions and apply it to provide a characterization of the optimal policy.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We analyze an infinite horizon, single product, periodic review model in which pricing and production/inventory decisions are made simultaneously. Demands in different periods are identically distributed random variables that are independent of each other and their distributions depend on the product price. Pricing and ordering decisions are made at the beginning of each period and all shortages are backlogged. Ordering cost includes both a fixed cost and a variable cost proportional to the amount ordered. The objective is to maximize expected discounted, or expected average profit over the infinite planning horizon. We show that a stationary (s,S,p) policy is optimal for both the discounted and average profit models with general demand functions. In such a policy, the period inventory is managed based on the classical (s,S) policy and price is determined based on the inventory position at the beginning of each period.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.