48 resultados para MARKOV DECISION PROCESSES
Resumo:
We study optimal control of Markov processes with age-dependent transition rates. The control policy is chosen continuously over time based on the state of the process and its age. We study infinite horizon discounted cost and infinite horizon average cost problems. Our approach is via the construction of an equivalent semi-Markov decision process. We characterise the value function and optimal controls for both discounted and average cost cases.
Resumo:
Optimal maintenance policies for a machine with degradation in performance with age and subject to failure are derived using optimal control theory. The optimal policies are shown to be, normally, of bang-coast nature, except in the case when probability of machine failure is a function of maintenance. It is also shown, in the deterministic case that a higher depreciation rate tends to reverse this policy to coast-bang. When the probability of failure is a function of maintenance, considerable computational effort is needed to obtain an optimal policy and the resulting policy is not easily implementable. For this case also, an optimal policy in the class of bang-coast policies is derived, using a semi-Markov decision model. A simple procedure for modifying the probability of machine failure with maintenance is employed. The results obtained extend and unify the recent results for this problem along both theoretical and practical lines. Numerical examples are presented to illustrate the results obtained.
Resumo:
We consider a small extent sensor network for event detection, in which nodes periodically take samples and then contend over a random access network to transmit their measurement packets to the fusion center. We consider two procedures at the fusion center for processing the measurements. The Bayesian setting, is assumed, that is, the fusion center has a prior distribution on the change time. In the first procedure, the decision algorithm at the fusion center is network-oblivious and makes a decision only when a complete vector of measurements taken at a sampling instant is available. In the second procedure, the decision algorithm at the fusion center is network-aware and processes measurements as they arrive, but in a time-causal order. In this case, the decision statistic depends on the network delays, whereas in the network-oblivious case, the decision statistic does not. This yields a Bayesian change-detection problem with a trade-off between the random network delay and the decision delay that is, a higher sampling rate reduces the decision delay but increases the random access delay. Under periodic sampling, in the network-oblivious case, the structure of the optimal stopping rule is the same as that without the network, and the optimal change detection delay decouples into the network delay and the optimal decision delay without the network. In the network-aware case, the optimal stopping problem is analyzed as a partially observable Markov decision process, in which the states of the queues and delays in the network need to be maintained. A sufficient decision statistic is the network state and the posterior probability of change having occurred, given the measurements received and the state of the network. The optimal regimes are studied using simulation.
Resumo:
This paper addresses the problem of finding optimal power control policies for wireless energy harvesting sensor (EHS) nodes with automatic repeat request (ARQ)-based packet transmissions. The EHS harvests energy from the environment according to a Bernoulli process; and it is required to operate within the constraint of energy neutrality. The EHS obtains partial channel state information (CSI) at the transmitter through the link-layer ARQ protocol, via the ACK/NACK feedback messages, and uses it to adapt the transmission power for the packet (re)transmission attempts. The underlying wireless fading channel is modeled as a finite state Markov chain with known transition probabilities. Thus, the goal of the power management policy is to determine the best power setting for the current packet transmission attempt, so as to maximize a long-run expected reward such as the expected outage probability. The problem is addressed in a decision-theoretic framework by casting it as a partially observable Markov decision process (POMDP). Due to the large size of the state-space, the exact solution to the POMDP is computationally expensive. Hence, two popular approximate solutions are considered, which yield good power management policies for the transmission attempts. Monte Carlo simulation results illustrate the efficacy of the approach and show that the approximate solutions significantly outperform conventional approaches.
Resumo:
Average-delay optimal scheduflng of messages arriving to the transmitter of a point-to-point channel is considered in this paper. We consider a discrete time batch-arrival batch-service queueing model for the communication scheme, with service time that may be a function of batch size. The question of delay optimality is addressed within the semi-Markov decision-theoretic framework. Approximations to the average-delay optimal policy are obtained.
Resumo:
We consider the problem of quickest detection of an intrusion using a sensor network, keeping only a minimal number of sensors active. By using a minimal number of sensor devices, we ensure that the energy expenditure for sensing, computation and communication is minimized (and the lifetime of the network is maximized). We model the intrusion detection (or change detection) problem as a Markov decision process (MDP). Based on the theory of MDP, we develop the following closed loop sleep/wake scheduling algorithms: (1) optimal control of Mk+1, the number of sensors in the wake state in time slot k + 1, (2) optimal control of qk+1, the probability of a sensor in the wake state in time slot k + 1, and an open loop sleep/wake scheduling algorithm which (3) computes q, the optimal probability of a sensor in the wake state (which does not vary with time), based on the sensor observations obtained until time slot k. Our results show that an optimum closed loop control on Mk+1 significantly decreases the cost compared to keeping any number of sensors active all the time. Also, among the three algorithms described, we observe that the total cost is minimum for the optimum control on Mk+1 and is maximum for the optimum open loop control on q.
Resumo:
We consider the problem of optimally scheduling a processor executing a multilayer protocol in an intelligent Network Interface Controller (NIC). In particular, we assume a typical LAN environment with class 4 transport service, a connectionless network service, and a class 1 link level protocol. We develop a queuing model for the problem. In the most general case this becomes a cyclic queuing network in which some queues have dedicated servers, and the others have a common schedulable server. We use sample path arguments and Markov decision theory to determine optimal service schedules. The optimal throughputs are compared with those obtained with simple policies. The optimal policy yields upto 25% improvement in some cases. In some other cases, the optimal policy does only slightly better than much simpler policies.
Resumo:
We consider a wireless sensor network whose main function is to detect certain infrequent alarm events, and to forward alarm packets to a base station, using geographical forwarding. The nodes know their locations, and they sleep-wake cycle, waking up periodically but not synchronously. In this situation, when a node has a packet to forward to the sink, there is a trade-off between how long this node waits for a suitable neighbor to wake up and the progress the packet makes towards the sink once it is forwarded to this neighbor. Hence, in choosing a relay node, we consider the problem of minimizing average delay subject to a constraint on the average progress. By constraint relaxation, we formulate this next hop relay selection problem as a Markov decision process (MDP). The exact optimal solution (BF (Best Forward)) can be found, but is computationally intensive. Next, we consider a mathematically simplified model for which the optimal policy (SF (Simplified Forward)) turns out to be a simple one-step-look-ahead rule. Simulations show that SF is very close in performance to BF, even for reasonably small node density. We then study the end-to-end performance of SF in comparison with two extremal policies: Max Forward (MF) and First Forward (FF), and an end-to-end delay minimising policy proposed by Kim et al. 1]. We find that, with appropriate choice of one hop average progress constraint, SF can be tuned to provide a favorable trade-off between end-to-end packet delay and the number of hops in the forwarding path.
Resumo:
We consider the problem of quickest detection of an intrusion using a sensor network, keeping only a minimal number of sensors active. By using a minimal number of sensor devices,we ensure that the energy expenditure for sensing, computation and communication is minimized (and the lifetime of the network is maximized). We model the intrusion detection (or change detection) problem as a Markov decision process (MDP). Based on the theory of MDP, we develop the following closed loop sleep/wake scheduling algorithms: 1) optimal control of Mk+1, the number of sensors in the wake state in time slot k + 1, 2) optimal control of qk+1, the probability of a sensor in the wake state in time slot k + 1, and an open loop sleep/wake scheduling algorithm which 3) computes q, the optimal probability of a sensor in the wake state (which does not vary with time),based on the sensor observations obtained until time slot k.Our results show that an optimum closed loop control onMk+1 significantly decreases the cost compared to keeping any number of sensors active all the time. Also, among the three algorithms described, we observe that the total cost is minimum for the optimum control on Mk+1 and is maximum for the optimum open loop control on q.
Resumo:
We conducted surveys of fire and fuels managers at local, regional, and national levels to gain insights into decision processes and information flows in wildfire management. Survey results in the form of fire managers’ decision calendars show how climate information needs vary seasonally, over space, and through the organizational network, and help determine optimal points for introducing climate information and forecasts into decision processes. We identified opportunities to use climate information in fire management, including seasonal to interannual climate forecasts at all organizational levels, to improve the targeting of fuels treatments and prescribed burns, the positioning and movement of initial attack resources, and staffing and budgeting decisions. Longer-term (5–10 years) outlooks also could be useful at the national level in setting budget and research priorities. We discuss these opportunities and examine the kinds of organizational changes that could facilitate effective use of existing climate information and climate forecast capabilities.
Resumo:
Our work is motivated by geographical forwarding of sporadic alarm packets to a base station in a wireless sensor network (WSN), where the nodes are sleep-wake cycling periodically and asynchronously. We seek to develop local forwarding algorithms that can be tuned so as to tradeoff the end-to-end delay against a total cost, such as the hop count or total energy. Our approach is to solve, at each forwarding node enroute to the sink, the local forwarding problem of minimizing one-hop waiting delay subject to a lower bound constraint on a suitable reward offered by the next-hop relay; the constraint serves to tune the tradeoff. The reward metric used for the local problem is based on the end-to-end total cost objective (for instance, when the total cost is hop count, we choose to use the progress toward sink made by a relay as the reward). The forwarding node, to begin with, is uncertain about the number of relays, their wake-up times, and the reward values, but knows the probability distributions of these quantities. At each relay wake-up instant, when a relay reveals its reward value, the forwarding node's problem is to forward the packet or to wait for further relays to wake-up. In terms of the operations research literature, our work can be considered as a variant of the asset selling problem. We formulate our local forwarding problem as a partially observable Markov decision process (POMDP) and obtain inner and outer bounds for the optimal policy. Motivated by the computational complexity involved in the policies derived out of these bounds, we formulate an alternate simplified model, the optimal policy for which is a simple threshold rule. We provide simulation results to compare the performance of the inner and outer bound policies against the simple policy, and also against the optimal policy when the source knows the exact number of relays. Observing the good performance and the ease of implementation of the simple policy, we apply it to our motivating problem, i.e., local geographical routing of sporadic alarm packets in a large WSN. We compare the end-to-end performance (i.e., average total delay and average total cost) obtained by the simple policy, when used for local geographical forwarding, against that obtained by the globally optimal forwarding algorithm proposed by Kim et al. 1].
Resumo:
The assignment of tasks to multiple resources becomes an interesting game theoretic problem, when both the task owner and the resources are strategic. In the classical, nonstrategic setting, where the states of the tasks and resources are observable by the controller, this problem is that of finding an optimal policy for a Markov decision process (MDP). When the states are held by strategic agents, the problem of an efficient task allocation extends beyond that of solving an MDP and becomes that of designing a mechanism. Motivated by this fact, we propose a general mechanism which decides on an allocation rule for the tasks and resources and a payment rule to incentivize agents' participation and truthful reports. In contrast to related dynamic strategic control problems studied in recent literature, the problem studied here has interdependent values: the benefit of an allocation to the task owner is not simply a function of the characteristics of the task itself and the allocation, but also of the state of the resources. We introduce a dynamic extension of Mezzetti's two phase mechanism for interdependent valuations. In this changed setting, the proposed dynamic mechanism is efficient, within period ex-post incentive compatible, and within period ex-post individually rational.
Resumo:
We study the tradeoff between the average error probability and the average queueing delay of messages which randomly arrive to the transmitter of a point-to-point discrete memoryless channel that uses variable rate fixed codeword length random coding. Bounds to the exponential decay rate of the average error probability with average queueing delay in the regime of large average delay are obtained. Upper and lower bounds to the optimal average delay for a given average error probability constraint are presented. We then formulate a constrained Markov decision problem for characterizing the rate of transmission as a function of queue size given an average error probability constraint. Using a Lagrange multiplier the constrained Markov decision problem is then converted to a problem of minimizing the average cost for a Markov decision problem. A simple heuristic policy is proposed which approximately achieves the optimal average cost.
Resumo:
We consider a power optimization problem with average delay constraint on the downlink of a Green Base-station. A Green Base-station is powered by both renewable energy such as solar or wind energy as well as conventional sources like diesel generators or the power grid. We try to minimize the energy drawn from conventional energy sources and utilize the harvested energy to the maximum extent. Each user also has an average delay constraint for its data. The optimal action consists of scheduling the users and allocating the optimal transmission rate for the chosen user. In this paper, we formulate the problem as a Markov Decision Problem and show the existence of a stationary average-cost optimal policy. We also derive some structural results for the optimal policy.
Resumo:
This paper addresses the problem of finding outage-optimal power control policies for wireless energy harvesting sensor (EHS) nodes with automatic repeat request (ARQ)-based packet transmissions. The power control policy of the EHS specifies the transmission power for each packet transmission attempt, based on all the information available at the EHS. In particular, the acknowledgement (ACK) or negative acknowledgement (NACK) messages received provide the EHS with partial information about the channel state. We solve the problem of finding an optimal power control policy by casting it as a partially observable Markov decision process (POMDP). We study the structure of the optimal power policy in two ways. First, for the special case of binary power levels at the EHS, we show that the optimal policy for the underlying Markov decision process (MDP) when the channel state is observable is a threshold policy in the battery state. Second, we benchmark the performance of the EHS by rigorously analyzing the outage probability of a general fixed-power transmission scheme, where the EHS uses a predetermined power level at each slot within the frame. Monte Carlo simulation results illustrate the performance of the POMDP approach and verify the accuracy of the analysis. They also show that the POMDP solutions can significantly outperform conventional ad hoc approaches.