61 resultados para Markov Decision Process


Relevância:

80.00% 80.00%

Publicador:

Resumo:

We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article proposes a three-timescale simulation based algorithm for solution of infinite horizon Markov Decision Processes (MDPs). We assume a finite state space and discounted cost criterion and adopt the value iteration approach. An approximation of the Dynamic Programming operator T is applied to the value function iterates. This 'approximate' operator is implemented using three timescales, the slowest of which updates the value function iterates. On the middle timescale we perform a gradient search over the feasible action set of each state using Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates, thus finding the minimizing action in T. On the fastest timescale, the 'critic' estimates, over which the gradient search is performed, are obtained. A sketch of convergence explaining the dynamics of the algorithm using associated ODEs is also presented. Numerical experiments on rate based flow control on a bottleneck node using a continuous-time queueing model are performed using the proposed algorithm. The results obtained are verified against classical value iteration where the feasible set is suitably discretized. Over such a discretized setting, a variant of the algorithm of [12] is compared and the proposed algorithm is found to converge faster.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we develop and numerically explore the modeling heuristic of using saturation attempt probabilities as state dependent attempt probabilities in an IEEE 802.11e infrastructure network carrying packet telephone calls and TCP controlled file downloads, using Enhanced Distributed Channel Access (EDCA). We build upon the fixed point analysis and performance insights in [1]. When there are a certain number of nodes of each class contending for the channel (i.e., have nonempty queues), then their attempt probabilities are taken to be those obtained from saturation analysis for that number of nodes. Then we model the system queue dynamics at the network nodes. With the proposed heuristic, the system evolution at channel slot boundaries becomes a Markov renewal process, and regenerative analysis yields the desired performance measures.The results obtained from this approach match well with ns2 simulations. We find that, with the default IEEE 802.11e EDCA parameters for AC 1 and AC 3, the voice call capacity decreases if even one file download is initiated by some station. Subsequently, reducing the voice calls increases the file download capacity almost linearly (by 1/3 Mbps per voice call for the 11 Mbps PHY).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider the problem of optimally scheduling a processor executing a multilayer protocol in an intelligent Network Interface Controller (NIC). In particular, we assume a typical LAN environment with class 4 transport service, a connectionless network service, and a class 1 link level protocol. We develop a queuing model for the problem. In the most general case this becomes a cyclic queuing network in which some queues have dedicated servers, and the others have a common schedulable server. We use sample path arguments and Markov decision theory to determine optimal service schedules. The optimal throughputs are compared with those obtained with simple policies. The optimal policy yields upto 25% improvement in some cases. In some other cases, the optimal policy does only slightly better than much simpler policies.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time Scale stochastic approximation. Convergence analysis, approximation issues and an example are studied.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov decision processes with finite state and action spaces, with a discounted reward criterion. The algorithm is of the gradient ascent type and performs a search in the space of stationary randomized policies. The algorithm uses certain simultaneous deterministic perturbation stochastic approximation (SDPSA) gradient estimates for enhanced performance. We show an application of our algorithm on a problem of mortgage refinancing. Our algorithm obtains the optimal refinancing strategies in a computationally efficient manner

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we develop and numerically explore the modeling heuristic of using saturation attempt probabilities as state dependent attempt probabilities in an IEEE 802.11e infrastructure network carrying packet telephone calls and TCP controlled file downloads, using enhanced distributed channel access (EDCA). We build upon the fixed point analysis and performance insights. When there are a certain number of nodes of each class contending for the channel (i.e., have nonempty queues), then their attempt probabilities are taken to be those obtained from saturation analysis for that number of nodes. Then we model the system queue dynamics at the network nodes. With the proposed heuristic, the system evolution at channel slot boundaries becomes a Markov renewal process, and regenerative analysis yields the desired performance measures. The results obtained from this approach match well with ns2 simulations. We find that, with the default IEEE 802.11e EDCA parameters for AC 1 and AC 3, the voice call capacity decreases if even one file download is initiated by some station. Subsequently, reducing the voice calls increases the file download capacity almost linearly (by 1/3 Mbps per voice call for the 11 Mbps PHY)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Optimal maintenance policies for a machine with degradation in performance with age and subject to failure are derived using optimal control theory. The optimal policies are shown to be, normally, of bang-coast nature, except in the case when probability of machine failure is a function of maintenance. It is also shown, in the deterministic case that a higher depreciation rate tends to reverse this policy to coast-bang. When the probability of failure is a function of maintenance, considerable computational effort is needed to obtain an optimal policy and the resulting policy is not easily implementable. For this case also, an optimal policy in the class of bang-coast policies is derived, using a semi-Markov decision model. A simple procedure for modifying the probability of machine failure with maintenance is employed. The results obtained extend and unify the recent results for this problem along both theoretical and practical lines. Numerical examples are presented to illustrate the results obtained.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We study the tradeoff between the average error probability and the average queueing delay of messages which randomly arrive to the transmitter of a point-to-point discrete memoryless channel that uses variable rate fixed codeword length random coding. Bounds to the exponential decay rate of the average error probability with average queueing delay in the regime of large average delay are obtained. Upper and lower bounds to the optimal average delay for a given average error probability constraint are presented. We then formulate a constrained Markov decision problem for characterizing the rate of transmission as a function of queue size given an average error probability constraint. Using a Lagrange multiplier the constrained Markov decision problem is then converted to a problem of minimizing the average cost for a Markov decision problem. A simple heuristic policy is proposed which approximately achieves the optimal average cost.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider a power optimization problem with average delay constraint on the downlink of a Green Base-station. A Green Base-station is powered by both renewable energy such as solar or wind energy as well as conventional sources like diesel generators or the power grid. We try to minimize the energy drawn from conventional energy sources and utilize the harvested energy to the maximum extent. Each user also has an average delay constraint for its data. The optimal action consists of scheduling the users and allocating the optimal transmission rate for the chosen user. In this paper, we formulate the problem as a Markov Decision Problem and show the existence of a stationary average-cost optimal policy. We also derive some structural results for the optimal policy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider the problem of finding optimal energy sharing policies that maximize the network performance of a system comprising of multiple sensor nodes and a single energy harvesting (EH) source. Sensor nodes periodically sense the random field and generate data, which is stored in the corresponding data queues. The EH source harnesses energy from ambient energy sources and the generated energy is stored in an energy buffer. Sensor nodes receive energy for data transmission from the EH source. The EH source has to efficiently share the stored energy among the nodes to minimize the long-run average delay in data transmission. We formulate the problem of energy sharing between the nodes in the framework of average cost infinite-horizon Markov decision processes (MDPs). We develop efficient energy sharing algorithms, namely Q-learning algorithm with exploration mechanisms based on the epsilon-greedy method as well as upper confidence bound (UCB). We extend these algorithms by incorporating state and action space aggregation to tackle state-action space explosion in the MDP. We also develop a cross entropy based method that incorporates policy parameterization to find near optimal energy sharing policies. Through simulations, we show that our algorithms yield energy sharing policies that outperform the heuristic greedy method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider the problem of optimizing the workforce of a service system. Adapting the staffing levels in such systems is non-trivial due to large variations in workload and the large number of system parameters do not allow for a brute force search. Further, because these parameters change on a weekly basis, the optimization should not take longer than a few hours. Our aim is to find the optimum staffing levels from a discrete high-dimensional parameter set, that minimizes the long run average of the single-stage cost function, while adhering to the constraints relating to queue stability and service-level agreement (SLA) compliance. The single-stage cost function balances the conflicting objectives of utilizing workers better and attaining the target SLAs. We formulate this problem as a constrained parameterized Markov cost process parameterized by the (discrete) staffing levels. We propose novel simultaneous perturbation stochastic approximation (SPSA)-based algorithms for solving the above problem. The algorithms include both first-order as well as second-order methods and incorporate SPSA-based gradient/Hessian estimates for primal descent, while performing dual ascent for the Lagrange multipliers. Both algorithms are online and update the staffing levels in an incremental fashion. Further, they involve a certain generalized smooth projection operator, which is essential to project the continuous-valued worker parameter tuned by our algorithms onto the discrete set. The smoothness is necessary to ensure that the underlying transition dynamics of the constrained Markov cost process is itself smooth (as a function of the continuous-valued parameter): a critical requirement to prove the convergence of both algorithms. We validate our algorithms via performance simulations based on data from five real-life service systems. For the sake of comparison, we also implement a scatter search based algorithm using state-of-the-art optimization tool-kit OptQuest. From the experiments, we observe that both our algorithms converge empirically and consistently outperform OptQuest in most of the settings considered. This finding coupled with the computational advantage of our algorithms make them amenable for adaptive labor staffing in real-life service systems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Process control rules may be specified using decision tables. Such a specification is superior when logical decisions to be taken in control dominate. In this paper we give a method of detecting redundancies, incompleteness, and contradictions in such specifications. Using such a technique thus ensures the validity of the specifications.