932 resultados para Markov Decision Process


Relevância:

80.00% 80.00%

Publicador:

Resumo:

We provide analytical models for capacity evaluation of an infrastructure IEEE 802.11 based network carrying TCP controlled file downloads or full-duplex packet telephone calls. In each case the analytical models utilize the attempt probabilities from a well known fixed-point based saturation analysis. For TCP controlled file downloads, following Bruno et al. (In Networking '04, LNCS 2042, pp. 626-637), we model the number of wireless stations (STAs) with ACKs as a Markov renewal process embedded at packet success instants. In our work, analysis of the evolution between the embedded instants is done by using saturation analysis to provide state dependent attempt probabilities. We show that in spite of its simplicity, our model works well, by comparing various simulated quantities, such as collision probability, with values predicted from our model. Next we consider N constant bit rate VoIP calls terminating at N STAs. We model the number of STAs that have an up-link voice packet as a Markov renewal process embedded at so called channel slot boundaries. Analysis of the evolution over a channel slot is done using saturation analysis as before. We find that again the AP is the bottleneck, and the system can support (in the sense of a bound on the probability of delay exceeding a given value) a number of calls less than that at which the arrival rate into the AP exceeds the average service rate applied to the AP. Finally, we extend the analytical model for VoIP calls to determine the call capacity of an 802.11b WLAN in a situation where VoIP calls originate from two different types of coders. We consider N-1 calls originating from Type 1 codecs and N-2 calls originating from Type 2 codecs. For G711 and G729 voice coders, we show that the analytical model again provides accurate results in comparison with simulations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Due to their non-stationarity, finite-horizon Markov decision processes (FH-MDPs) have one probability transition matrix per stage. Thus the curse of dimensionality affects FH-MDPs more severely than infinite-horizon MDPs. We propose two parametrized 'actor-critic' algorithms to compute optimal policies for FH-MDPs. Both algorithms use the two-timescale stochastic approximation technique, thus simultaneously performing gradient search in the parametrized policy space (the 'actor') on a slower timescale and learning the policy gradient (the 'critic') via a faster recursion. This is in contrast to methods where critic recursions learn the cost-to-go proper. We show w.p 1 convergence to a set with the necessary condition for constrained optima. The proposed parameterization is for FHMDPs with compact action sets, although certain exceptions can be handled. Further, a third algorithm for stochastic control of stopping time processes is presented. We explain why current policy evaluation methods do not work as critic to the proposed actor recursion. Simulation results from flow-control in communication networks attest to the performance advantages of all three algorithms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article proposes a three-timescale simulation based algorithm for solution of infinite horizon Markov Decision Processes (MDPs). We assume a finite state space and discounted cost criterion and adopt the value iteration approach. An approximation of the Dynamic Programming operator T is applied to the value function iterates. This 'approximate' operator is implemented using three timescales, the slowest of which updates the value function iterates. On the middle timescale we perform a gradient search over the feasible action set of each state using Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates, thus finding the minimizing action in T. On the fastest timescale, the 'critic' estimates, over which the gradient search is performed, are obtained. A sketch of convergence explaining the dynamics of the algorithm using associated ODEs is also presented. Numerical experiments on rate based flow control on a bottleneck node using a continuous-time queueing model are performed using the proposed algorithm. The results obtained are verified against classical value iteration where the feasible set is suitably discretized. Over such a discretized setting, a variant of the algorithm of [12] is compared and the proposed algorithm is found to converge faster.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mikael Juselius’ doctoral dissertation covers a range of significant issues in modern macroeconomics by empirically testing a number of important theoretical hypotheses. The first essay presents indirect evidence within the framework of the cointegrated VAR model on the elasticity of substitution between capital and labor by using Finnish manufacturing data. Instead of estimating the elasticity of substitution by using the first order conditions, he develops a new approach that utilizes a CES production function in a model with a 3-stage decision process: investment in the long run, wage bargaining in the medium run and price and employment decisions in the short run. He estimates the elasticity of substitution to be below one. The second essay tests the restrictions implied by the core equations of the New Keynesian Model (NKM) in a vector autoregressive model (VAR) by using both Euro area and U.S. data. Both the new Keynesian Phillips curve and the aggregate demand curve are estimated and tested. The restrictions implied by the core equations of the NKM are rejected on both U.S. and Euro area data. These results are important for further research. The third essay is methodologically similar to essay 2, but it concentrates on Finnish macro data by adopting a theoretical framework of an open economy. Juselius’ results suggests that the open economy NKM framework is too stylized to provide an adequate explanation for Finnish inflation. The final essay provides a macroeconometric model of Finnish inflation and associated explanatory variables and it estimates the relative importance of different inflation theories. His main finding is that Finnish inflation is primarily determined by excess demand in the product market and by changes in the long-term interest rate. This study is part of the research agenda carried out by the Research Unit of Economic Structure and Growth (RUESG). The aim of RUESG it to conduct theoretical and empirical research with respect to important issues in industrial economics, real option theory, game theory, organization theory, theory of financial systems as well as to study problems in labor markets, macroeconomics, natural resources, taxation and time series econometrics. RUESG was established at the beginning of 1995 and is one of the National Centers of Excellence in research selected by the Academy of Finland. It is financed jointly by the Academy of Finland, the University of Helsinki, the Yrjö Jahnsson Foundation, Bank of Finland and the Nokia Group. This support is gratefully acknowledged.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we develop and numerically explore the modeling heuristic of using saturation attempt probabilities as state dependent attempt probabilities in an IEEE 802.11e infrastructure network carrying packet telephone calls and TCP controlled file downloads, using Enhanced Distributed Channel Access (EDCA). We build upon the fixed point analysis and performance insights in [1]. When there are a certain number of nodes of each class contending for the channel (i.e., have nonempty queues), then their attempt probabilities are taken to be those obtained from saturation analysis for that number of nodes. Then we model the system queue dynamics at the network nodes. With the proposed heuristic, the system evolution at channel slot boundaries becomes a Markov renewal process, and regenerative analysis yields the desired performance measures.The results obtained from this approach match well with ns2 simulations. We find that, with the default IEEE 802.11e EDCA parameters for AC 1 and AC 3, the voice call capacity decreases if even one file download is initiated by some station. Subsequently, reducing the voice calls increases the file download capacity almost linearly (by 1/3 Mbps per voice call for the 11 Mbps PHY).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider the problem of optimally scheduling a processor executing a multilayer protocol in an intelligent Network Interface Controller (NIC). In particular, we assume a typical LAN environment with class 4 transport service, a connectionless network service, and a class 1 link level protocol. We develop a queuing model for the problem. In the most general case this becomes a cyclic queuing network in which some queues have dedicated servers, and the others have a common schedulable server. We use sample path arguments and Markov decision theory to determine optimal service schedules. The optimal throughputs are compared with those obtained with simple policies. The optimal policy yields upto 25% improvement in some cases. In some other cases, the optimal policy does only slightly better than much simpler policies.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time Scale stochastic approximation. Convergence analysis, approximation issues and an example are studied.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov decision processes with finite state and action spaces, with a discounted reward criterion. The algorithm is of the gradient ascent type and performs a search in the space of stationary randomized policies. The algorithm uses certain simultaneous deterministic perturbation stochastic approximation (SDPSA) gradient estimates for enhanced performance. We show an application of our algorithm on a problem of mortgage refinancing. Our algorithm obtains the optimal refinancing strategies in a computationally efficient manner

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we develop and numerically explore the modeling heuristic of using saturation attempt probabilities as state dependent attempt probabilities in an IEEE 802.11e infrastructure network carrying packet telephone calls and TCP controlled file downloads, using enhanced distributed channel access (EDCA). We build upon the fixed point analysis and performance insights. When there are a certain number of nodes of each class contending for the channel (i.e., have nonempty queues), then their attempt probabilities are taken to be those obtained from saturation analysis for that number of nodes. Then we model the system queue dynamics at the network nodes. With the proposed heuristic, the system evolution at channel slot boundaries becomes a Markov renewal process, and regenerative analysis yields the desired performance measures. The results obtained from this approach match well with ns2 simulations. We find that, with the default IEEE 802.11e EDCA parameters for AC 1 and AC 3, the voice call capacity decreases if even one file download is initiated by some station. Subsequently, reducing the voice calls increases the file download capacity almost linearly (by 1/3 Mbps per voice call for the 11 Mbps PHY)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Optimal maintenance policies for a machine with degradation in performance with age and subject to failure are derived using optimal control theory. The optimal policies are shown to be, normally, of bang-coast nature, except in the case when probability of machine failure is a function of maintenance. It is also shown, in the deterministic case that a higher depreciation rate tends to reverse this policy to coast-bang. When the probability of failure is a function of maintenance, considerable computational effort is needed to obtain an optimal policy and the resulting policy is not easily implementable. For this case also, an optimal policy in the class of bang-coast policies is derived, using a semi-Markov decision model. A simple procedure for modifying the probability of machine failure with maintenance is employed. The results obtained extend and unify the recent results for this problem along both theoretical and practical lines. Numerical examples are presented to illustrate the results obtained.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We study the tradeoff between the average error probability and the average queueing delay of messages which randomly arrive to the transmitter of a point-to-point discrete memoryless channel that uses variable rate fixed codeword length random coding. Bounds to the exponential decay rate of the average error probability with average queueing delay in the regime of large average delay are obtained. Upper and lower bounds to the optimal average delay for a given average error probability constraint are presented. We then formulate a constrained Markov decision problem for characterizing the rate of transmission as a function of queue size given an average error probability constraint. Using a Lagrange multiplier the constrained Markov decision problem is then converted to a problem of minimizing the average cost for a Markov decision problem. A simple heuristic policy is proposed which approximately achieves the optimal average cost.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider a power optimization problem with average delay constraint on the downlink of a Green Base-station. A Green Base-station is powered by both renewable energy such as solar or wind energy as well as conventional sources like diesel generators or the power grid. We try to minimize the energy drawn from conventional energy sources and utilize the harvested energy to the maximum extent. Each user also has an average delay constraint for its data. The optimal action consists of scheduling the users and allocating the optimal transmission rate for the chosen user. In this paper, we formulate the problem as a Markov Decision Problem and show the existence of a stationary average-cost optimal policy. We also derive some structural results for the optimal policy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider the problem of finding optimal energy sharing policies that maximize the network performance of a system comprising of multiple sensor nodes and a single energy harvesting (EH) source. Sensor nodes periodically sense the random field and generate data, which is stored in the corresponding data queues. The EH source harnesses energy from ambient energy sources and the generated energy is stored in an energy buffer. Sensor nodes receive energy for data transmission from the EH source. The EH source has to efficiently share the stored energy among the nodes to minimize the long-run average delay in data transmission. We formulate the problem of energy sharing between the nodes in the framework of average cost infinite-horizon Markov decision processes (MDPs). We develop efficient energy sharing algorithms, namely Q-learning algorithm with exploration mechanisms based on the epsilon-greedy method as well as upper confidence bound (UCB). We extend these algorithms by incorporating state and action space aggregation to tackle state-action space explosion in the MDP. We also develop a cross entropy based method that incorporates policy parameterization to find near optimal energy sharing policies. Through simulations, we show that our algorithms yield energy sharing policies that outperform the heuristic greedy method.