40 resultados para Markov Branching Process
em Indian Institute of Science - Bangalore - Índia
Resumo:
We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.
Resumo:
We study optimal control of Markov processes with age-dependent transition rates. The control policy is chosen continuously over time based on the state of the process and its age. We study infinite horizon discounted cost and infinite horizon average cost problems. Our approach is via the construction of an equivalent semi-Markov decision process. We characterise the value function and optimal controls for both discounted and average cost cases.
Resumo:
We introduce and study a class of non-stationary semi-Markov decision processes on a finite horizon. By constructing an equivalent Markov decision process, we establish the existence of a piecewise open loop relaxed control which is optimal for the finite horizon problem.
Resumo:
We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.
Resumo:
This paper considers antenna selection (AS) at a receiver equipped with multiple antenna elements but only a single radio frequency chain for packet reception. As information about the channel state is acquired using training symbols (pilots), the receiver makes its AS decisions based on noisy channel estimates. Additional information that can be exploited for AS includes the time-correlation of the wireless channel and the results of the link-layer error checks upon receiving the data packets. In this scenario, the task of the receiver is to sequentially select (a) the pilot symbol allocation, i.e., how to distribute the available pilot symbols among the antenna elements, for channel estimation on each of the receive antennas; and (b) the antenna to be used for data packet reception. The goal is to maximize the expected throughput, based on the past history of allocation and selection decisions, and the corresponding noisy channel estimates and error check results. Since the channel state is only partially observed through the noisy pilots and the error checks, the joint problem of pilot allocation and AS is modeled as a partially observed Markov decision process (POMDP). The solution to the POMDP yields the policy that maximizes the long-term expected throughput. Using the Finite State Markov Chain (FSMC) model for the wireless channel, the performance of the POMDP solution is compared with that of other existing schemes, and it is illustrated through numerical evaluation that the POMDP solution significantly outperforms them.
Resumo:
We extend the modeling heuristic of (Harsha et al. 2006. In IEEE IWQoS 06, pp 178 - 187) to evaluate the performance of an IEEE 802.11e infrastructure network carrying packet telephone calls, streaming video sessions and TCP controlled file downloads, using Enhanced Distributed Channel Access (EDCA). We identify the time boundaries of activities on the channel (called channel slot boundaries) and derive a Markov Renewal Process of the contending nodes on these epochs. This is achieved by the use of attempt probabilities of the contending nodes as those obtained from the saturation fixed point analysis of (Ramaiyan et al. 2005. In Proceedings ACM Sigmetrics, `05. Journal version accepted for publication in IEEE TON). Regenerative analysis on this MRP yields the desired steady state performance measures. We then use the MRP model to develop an effective bandwidth approach for obtaining a bound on the size of the buffer required at the video queue of the AP, such that the streaming video packet loss probability is kept to less than 1%. The results obtained match well with simulations using the network simulator, ns-2. We find that, with the default IEEE 802.11e EDCA parameters for access categories AC 1, AC 2 and AC 3, the voice call capacity decreases if even one streaming video session and one TCP file download are initiated by some wireless station. Subsequently, reducing the voice calls increases the video downlink stream throughput by 0.38 Mbps and file download capacity by 0.14 Mbps, for every voice call (for the 11 Mbps PHY). We find that a buffer size of 75KB is sufficient to ensure that the video packet loss probability at the QAP is within 1%.
Resumo:
We provide analytical models for capacity evaluation of an infrastructure IEEE 802.11 based network carrying TCP controlled file downloads or full-duplex packet telephone calls. In each case the analytical models utilize the attempt probabilities from a well known fixed-point based saturation analysis. For TCP controlled file downloads, following Bruno et al. (In Networking '04, LNCS 2042, pp. 626-637), we model the number of wireless stations (STAs) with ACKs as a Markov renewal process embedded at packet success instants. In our work, analysis of the evolution between the embedded instants is done by using saturation analysis to provide state dependent attempt probabilities. We show that in spite of its simplicity, our model works well, by comparing various simulated quantities, such as collision probability, with values predicted from our model. Next we consider N constant bit rate VoIP calls terminating at N STAs. We model the number of STAs that have an up-link voice packet as a Markov renewal process embedded at so called channel slot boundaries. Analysis of the evolution over a channel slot is done using saturation analysis as before. We find that again the AP is the bottleneck, and the system can support (in the sense of a bound on the probability of delay exceeding a given value) a number of calls less than that at which the arrival rate into the AP exceeds the average service rate applied to the AP. Finally, we extend the analytical model for VoIP calls to determine the call capacity of an 802.11b WLAN in a situation where VoIP calls originate from two different types of coders. We consider N-1 calls originating from Type 1 codecs and N-2 calls originating from Type 2 codecs. For G711 and G729 voice coders, we show that the analytical model again provides accurate results in comparison with simulations.
Resumo:
We consider the problem of quickest detection of an intrusion using a sensor network, keeping only a minimal number of sensors active. By using a minimal number of sensor devices, we ensure that the energy expenditure for sensing, computation and communication is minimized (and the lifetime of the network is maximized). We model the intrusion detection (or change detection) problem as a Markov decision process (MDP). Based on the theory of MDP, we develop the following closed loop sleep/wake scheduling algorithms: (1) optimal control of Mk+1, the number of sensors in the wake state in time slot k + 1, (2) optimal control of qk+1, the probability of a sensor in the wake state in time slot k + 1, and an open loop sleep/wake scheduling algorithm which (3) computes q, the optimal probability of a sensor in the wake state (which does not vary with time), based on the sensor observations obtained until time slot k. Our results show that an optimum closed loop control on Mk+1 significantly decreases the cost compared to keeping any number of sensors active all the time. Also, among the three algorithms described, we observe that the total cost is minimum for the optimum control on Mk+1 and is maximum for the optimum open loop control on q.
Resumo:
In this paper we develop and numerically explore the modeling heuristic of using saturation attempt probabilities as state dependent attempt probabilities in an IEEE 802.11e infrastructure network carrying packet telephone calls and TCP controlled file downloads, using Enhanced Distributed Channel Access (EDCA). We build upon the fixed point analysis and performance insights in [1]. When there are a certain number of nodes of each class contending for the channel (i.e., have nonempty queues), then their attempt probabilities are taken to be those obtained from saturation analysis for that number of nodes. Then we model the system queue dynamics at the network nodes. With the proposed heuristic, the system evolution at channel slot boundaries becomes a Markov renewal process, and regenerative analysis yields the desired performance measures.The results obtained from this approach match well with ns2 simulations. We find that, with the default IEEE 802.11e EDCA parameters for AC 1 and AC 3, the voice call capacity decreases if even one file download is initiated by some station. Subsequently, reducing the voice calls increases the file download capacity almost linearly (by 1/3 Mbps per voice call for the 11 Mbps PHY).
Resumo:
We consider a wireless sensor network whose main function is to detect certain infrequent alarm events, and to forward alarm packets to a base station, using geographical forwarding. The nodes know their locations, and they sleep-wake cycle, waking up periodically but not synchronously. In this situation, when a node has a packet to forward to the sink, there is a trade-off between how long this node waits for a suitable neighbor to wake up and the progress the packet makes towards the sink once it is forwarded to this neighbor. Hence, in choosing a relay node, we consider the problem of minimizing average delay subject to a constraint on the average progress. By constraint relaxation, we formulate this next hop relay selection problem as a Markov decision process (MDP). The exact optimal solution (BF (Best Forward)) can be found, but is computationally intensive. Next, we consider a mathematically simplified model for which the optimal policy (SF (Simplified Forward)) turns out to be a simple one-step-look-ahead rule. Simulations show that SF is very close in performance to BF, even for reasonably small node density. We then study the end-to-end performance of SF in comparison with two extremal policies: Max Forward (MF) and First Forward (FF), and an end-to-end delay minimising policy proposed by Kim et al. 1]. We find that, with appropriate choice of one hop average progress constraint, SF can be tuned to provide a favorable trade-off between end-to-end packet delay and the number of hops in the forwarding path.
Resumo:
We consider the problem of quickest detection of an intrusion using a sensor network, keeping only a minimal number of sensors active. By using a minimal number of sensor devices,we ensure that the energy expenditure for sensing, computation and communication is minimized (and the lifetime of the network is maximized). We model the intrusion detection (or change detection) problem as a Markov decision process (MDP). Based on the theory of MDP, we develop the following closed loop sleep/wake scheduling algorithms: 1) optimal control of Mk+1, the number of sensors in the wake state in time slot k + 1, 2) optimal control of qk+1, the probability of a sensor in the wake state in time slot k + 1, and an open loop sleep/wake scheduling algorithm which 3) computes q, the optimal probability of a sensor in the wake state (which does not vary with time),based on the sensor observations obtained until time slot k.Our results show that an optimum closed loop control onMk+1 significantly decreases the cost compared to keeping any number of sensors active all the time. Also, among the three algorithms described, we observe that the total cost is minimum for the optimum control on Mk+1 and is maximum for the optimum open loop control on q.
Resumo:
In this paper we develop and numerically explore the modeling heuristic of using saturation attempt probabilities as state dependent attempt probabilities in an IEEE 802.11e infrastructure network carrying packet telephone calls and TCP controlled file downloads, using enhanced distributed channel access (EDCA). We build upon the fixed point analysis and performance insights. When there are a certain number of nodes of each class contending for the channel (i.e., have nonempty queues), then their attempt probabilities are taken to be those obtained from saturation analysis for that number of nodes. Then we model the system queue dynamics at the network nodes. With the proposed heuristic, the system evolution at channel slot boundaries becomes a Markov renewal process, and regenerative analysis yields the desired performance measures. The results obtained from this approach match well with ns2 simulations. We find that, with the default IEEE 802.11e EDCA parameters for AC 1 and AC 3, the voice call capacity decreases if even one file download is initiated by some station. Subsequently, reducing the voice calls increases the file download capacity almost linearly (by 1/3 Mbps per voice call for the 11 Mbps PHY)
Resumo:
We consider a small extent sensor network for event detection, in which nodes periodically take samples and then contend over a random access network to transmit their measurement packets to the fusion center. We consider two procedures at the fusion center for processing the measurements. The Bayesian setting, is assumed, that is, the fusion center has a prior distribution on the change time. In the first procedure, the decision algorithm at the fusion center is network-oblivious and makes a decision only when a complete vector of measurements taken at a sampling instant is available. In the second procedure, the decision algorithm at the fusion center is network-aware and processes measurements as they arrive, but in a time-causal order. In this case, the decision statistic depends on the network delays, whereas in the network-oblivious case, the decision statistic does not. This yields a Bayesian change-detection problem with a trade-off between the random network delay and the decision delay that is, a higher sampling rate reduces the decision delay but increases the random access delay. Under periodic sampling, in the network-oblivious case, the structure of the optimal stopping rule is the same as that without the network, and the optimal change detection delay decouples into the network delay and the optimal decision delay without the network. In the network-aware case, the optimal stopping problem is analyzed as a partially observable Markov decision process, in which the states of the queues and delays in the network need to be maintained. A sufficient decision statistic is the network state and the posterior probability of change having occurred, given the measurements received and the state of the network. The optimal regimes are studied using simulation.
Resumo:
Our work is motivated by geographical forwarding of sporadic alarm packets to a base station in a wireless sensor network (WSN), where the nodes are sleep-wake cycling periodically and asynchronously. We seek to develop local forwarding algorithms that can be tuned so as to tradeoff the end-to-end delay against a total cost, such as the hop count or total energy. Our approach is to solve, at each forwarding node enroute to the sink, the local forwarding problem of minimizing one-hop waiting delay subject to a lower bound constraint on a suitable reward offered by the next-hop relay; the constraint serves to tune the tradeoff. The reward metric used for the local problem is based on the end-to-end total cost objective (for instance, when the total cost is hop count, we choose to use the progress toward sink made by a relay as the reward). The forwarding node, to begin with, is uncertain about the number of relays, their wake-up times, and the reward values, but knows the probability distributions of these quantities. At each relay wake-up instant, when a relay reveals its reward value, the forwarding node's problem is to forward the packet or to wait for further relays to wake-up. In terms of the operations research literature, our work can be considered as a variant of the asset selling problem. We formulate our local forwarding problem as a partially observable Markov decision process (POMDP) and obtain inner and outer bounds for the optimal policy. Motivated by the computational complexity involved in the policies derived out of these bounds, we formulate an alternate simplified model, the optimal policy for which is a simple threshold rule. We provide simulation results to compare the performance of the inner and outer bound policies against the simple policy, and also against the optimal policy when the source knows the exact number of relays. Observing the good performance and the ease of implementation of the simple policy, we apply it to our motivating problem, i.e., local geographical routing of sporadic alarm packets in a large WSN. We compare the end-to-end performance (i.e., average total delay and average total cost) obtained by the simple policy, when used for local geographical forwarding, against that obtained by the globally optimal forwarding algorithm proposed by Kim et al. 1].
Resumo:
The assignment of tasks to multiple resources becomes an interesting game theoretic problem, when both the task owner and the resources are strategic. In the classical, nonstrategic setting, where the states of the tasks and resources are observable by the controller, this problem is that of finding an optimal policy for a Markov decision process (MDP). When the states are held by strategic agents, the problem of an efficient task allocation extends beyond that of solving an MDP and becomes that of designing a mechanism. Motivated by this fact, we propose a general mechanism which decides on an allocation rule for the tasks and resources and a payment rule to incentivize agents' participation and truthful reports. In contrast to related dynamic strategic control problems studied in recent literature, the problem studied here has interdependent values: the benefit of an allocation to the task owner is not simply a function of the characteristics of the task itself and the allocation, but also of the state of the resources. We introduce a dynamic extension of Mezzetti's two phase mechanism for interdependent valuations. In this changed setting, the proposed dynamic mechanism is efficient, within period ex-post incentive compatible, and within period ex-post individually rational.