40 resultados para Markov additive process
Resumo:
The assignment of tasks to multiple resources becomes an interesting game theoretic problem, when both the task owner and the resources are strategic. In the classical, nonstrategic setting, where the states of the tasks and resources are observable by the controller, this problem is that of finding an optimal policy for a Markov decision process (MDP). When the states are held by strategic agents, the problem of an efficient task allocation extends beyond that of solving an MDP and becomes that of designing a mechanism. Motivated by this fact, we propose a general mechanism which decides on an allocation rule for the tasks and resources and a payment rule to incentivize agents' participation and truthful reports. In contrast to related dynamic strategic control problems studied in recent literature, the problem studied here has interdependent values: the benefit of an allocation to the task owner is not simply a function of the characteristics of the task itself and the allocation, but also of the state of the resources. We introduce a dynamic extension of Mezzetti's two phase mechanism for interdependent valuations. In this changed setting, the proposed dynamic mechanism is efficient, within period ex-post incentive compatible, and within period ex-post individually rational.
Resumo:
This paper addresses the problem of finding outage-optimal power control policies for wireless energy harvesting sensor (EHS) nodes with automatic repeat request (ARQ)-based packet transmissions. The power control policy of the EHS specifies the transmission power for each packet transmission attempt, based on all the information available at the EHS. In particular, the acknowledgement (ACK) or negative acknowledgement (NACK) messages received provide the EHS with partial information about the channel state. We solve the problem of finding an optimal power control policy by casting it as a partially observable Markov decision process (POMDP). We study the structure of the optimal power policy in two ways. First, for the special case of binary power levels at the EHS, we show that the optimal policy for the underlying Markov decision process (MDP) when the channel state is observable is a threshold policy in the battery state. Second, we benchmark the performance of the EHS by rigorously analyzing the outage probability of a general fixed-power transmission scheme, where the EHS uses a predetermined power level at each slot within the frame. Monte Carlo simulation results illustrate the performance of the POMDP approach and verify the accuracy of the analysis. They also show that the POMDP solutions can significantly outperform conventional ad hoc approaches.
Resumo:
This paper addresses the problem of finding optimal power control policies for wireless energy harvesting sensor (EHS) nodes with automatic repeat request (ARQ)-based packet transmissions. The EHS harvests energy from the environment according to a Bernoulli process; and it is required to operate within the constraint of energy neutrality. The EHS obtains partial channel state information (CSI) at the transmitter through the link-layer ARQ protocol, via the ACK/NACK feedback messages, and uses it to adapt the transmission power for the packet (re)transmission attempts. The underlying wireless fading channel is modeled as a finite state Markov chain with known transition probabilities. Thus, the goal of the power management policy is to determine the best power setting for the current packet transmission attempt, so as to maximize a long-run expected reward such as the expected outage probability. The problem is addressed in a decision-theoretic framework by casting it as a partially observable Markov decision process (POMDP). Due to the large size of the state-space, the exact solution to the POMDP is computationally expensive. Hence, two popular approximate solutions are considered, which yield good power management policies for the transmission attempts. Monte Carlo simulation results illustrate the efficacy of the approach and show that the approximate solutions significantly outperform conventional approaches.
Resumo:
We study the problem of optimal sequential (''as-you-go'') deployment of wireless relay nodes, as a person walks along a line of random length (with a known distribution). The objective is to create an impromptu multihop wireless network for connecting a packet source to be placed at the end of the line with a sink node located at the starting point, to operate in the light traffic regime. In walking from the sink towards the source, at every step, measurements yield the transmit powers required to establish links to one or more previously placed nodes. Based on these measurements, at every step, a decision is made to place a relay node, the overall system objective being to minimize a linear combination of the expected sum power (or the expected maximum power) required to deliver a packet from the source to the sink node and the expected number of relay nodes deployed. For each of these two objectives, two different relay selection strategies are considered: (i) each relay communicates with the sink via its immediate previous relay, (ii) the communication path can skip some of the deployed relays. With appropriate modeling assumptions, we formulate each of these problems as a Markov decision process (MDP). We provide the optimal policy structures for all these cases, and provide illustrations of the policies and their performance, via numerical results, for some typical parameters.
Resumo:
Our work is motivated by impromptu (or ``as-you-go'') deployment of wireless relay nodes along a path, a need that arises in many situations. In this paper, the path is modeled as starting at the origin (where there is the data sink, e.g., the control center), and evolving randomly over a lattice in the positive quadrant. A person walks along the path deploying relay nodes as he goes. At each step, the path can, randomly, either continue in the same direction or take a turn, or come to an end, at which point a data source (e.g., a sensor) has to be placed, that will send packets to the data sink. A decision has to be made at each step whether or not to place a wireless relay node. Assuming that the packet generation rate by the source is very low, and simple link-by-link scheduling, we consider the problem of sequential relay placement so as to minimize the expectation of an end-to-end cost metric (a linear combination of the sum of convex hop costs and the number of relays placed). This impromptu relay placement problem is formulated as a total cost Markov decision process. First, we derive the optimal policy in terms of an optimal placement set and show that this set is characterized by a boundary (with respect to the position of the last placed relay) beyond which it is optimal to place the next relay. Next, based on a simpler one-step-look-ahead characterization of the optimal policy, we propose an algorithm which is proved to converge to the optimal placement set in a finite number of steps and which is faster than value iteration. We show by simulations that the distance threshold based heuristic, usually assumed in the literature, is close to the optimal, provided that the threshold distance is carefully chosen. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we consider an intrusion detection application for Wireless Sensor Networks. We study the problem of scheduling the sleep times of the individual sensors, where the objective is to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous stateaction spaces, in a manner similar to Fuemmeler and Veeravalli (IEEE Trans Signal Process 56(5), 2091-2101, 2008). However, unlike their formulation, we consider infinite horizon discounted and average cost objectives as performance criteria. For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation. Feature-based representations and function approximation is necessary to handle the curse of dimensionality associated with the underlying POMDP. Our proposed algorithm incorporates a policy gradient update using a one-simulation simultaneous perturbation stochastic approximation estimate on the faster timescale, while the Q-value parameter (arising from a linear function approximation architecture for the Q-values) is updated in an on-policy temporal difference algorithm-like fashion on the slower timescale. The feature selection scheme employed in each of our algorithms manages the energy and tracking components in a manner that assists the search for the optimal sleep-scheduling policy. For the sake of comparison, in both discounted and average settings, we also develop a function approximation analogue of the Q-learning algorithm. This algorithm, unlike the two-timescale variant, does not possess theoretical convergence guarantees. Finally, we also adapt our algorithms to include a stochastic iterative estimation scheme for the intruder's mobility model and this is useful in settings where the latter is not known. Our simulation results on a synthetic 2-dimensional network setting suggest that our algorithms result in better tracking accuracy at the cost of only a few additional sensors, in comparison to a recent prior work.
Resumo:
The aim in this paper is to allocate the `sleep time' of the individual sensors in an intrusion detection application so that the energy consumption from the sensors is reduced, while keeping the tracking error to a minimum. We propose two novel reinforcement learning (RL) based algorithms that attempt to minimize a certain long-run average cost objective. Both our algorithms incorporate feature-based representations to handle the curse of dimensionality associated with the underlying partially-observable Markov decision process (POMDP). Further, the feature selection scheme used in our algorithms intelligently manages the energy cost and tracking cost factors, which in turn assists the search for the optimal sleeping policy. We also extend these algorithms to a setting where the intruder's mobility model is not known by incorporating a stochastic iterative scheme for estimating the mobility model. The simulation results on a synthetic 2-d network setting are encouraging.
Resumo:
We consider the problem of optimizing the workforce of a service system. Adapting the staffing levels in such systems is non-trivial due to large variations in workload and the large number of system parameters do not allow for a brute force search. Further, because these parameters change on a weekly basis, the optimization should not take longer than a few hours. Our aim is to find the optimum staffing levels from a discrete high-dimensional parameter set, that minimizes the long run average of the single-stage cost function, while adhering to the constraints relating to queue stability and service-level agreement (SLA) compliance. The single-stage cost function balances the conflicting objectives of utilizing workers better and attaining the target SLAs. We formulate this problem as a constrained parameterized Markov cost process parameterized by the (discrete) staffing levels. We propose novel simultaneous perturbation stochastic approximation (SPSA)-based algorithms for solving the above problem. The algorithms include both first-order as well as second-order methods and incorporate SPSA-based gradient/Hessian estimates for primal descent, while performing dual ascent for the Lagrange multipliers. Both algorithms are online and update the staffing levels in an incremental fashion. Further, they involve a certain generalized smooth projection operator, which is essential to project the continuous-valued worker parameter tuned by our algorithms onto the discrete set. The smoothness is necessary to ensure that the underlying transition dynamics of the constrained Markov cost process is itself smooth (as a function of the continuous-valued parameter): a critical requirement to prove the convergence of both algorithms. We validate our algorithms via performance simulations based on data from five real-life service systems. For the sake of comparison, we also implement a scatter search based algorithm using state-of-the-art optimization tool-kit OptQuest. From the experiments, we observe that both our algorithms converge empirically and consistently outperform OptQuest in most of the settings considered. This finding coupled with the computational advantage of our algorithms make them amenable for adaptive labor staffing in real-life service systems.
Resumo:
In geographical forwarding of packets in a large wireless sensor network (WSN) with sleep-wake cycling nodes, we are interested in the local decision problem faced by a node that has ``custody'' of a packet and has to choose one among a set of next-hop relay nodes to forward the packet toward the sink. Each relay is associated with a ``reward'' that summarizes the benefit of forwarding the packet through that relay. We seek a solution to this local problem, the idea being that such a solution, if adopted by every node, could provide a reasonable heuristic for the end-to-end forwarding problem. Toward this end, we propose a local relay selection problem consisting of a forwarding node and a collection of relay nodes, with the relays waking up sequentially at random times. At each relay wake-up instant, the forwarder can choose to probe a relay to learn its reward value, based on which the forwarder can then decide whether to stop (and forward its packet to the chosen relay) or to continue to wait for further relays to wake up. The forwarder's objective is to select a relay so as to minimize a combination of waiting delay, reward, and probing cost. The local decision problem can be considered as a variant of the asset selling problem studied in the operations research literature. We formulate the local problem as a Markov decision process (MDP) and characterize the solution in terms of stopping sets and probing sets. We provide results illustrating the structure of the stopping sets, namely, the (lower bound) threshold and the stage independence properties. Regarding the probing sets, we make an interesting conjecture that these sets are characterized by upper bounds. Through simulation experiments, we provide valuable insights into the performance of the optimal local forwarding and its use as an end-to-end forwarding heuristic.
Resumo:
Optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users amidst the rapid increase in the usage of vehicles. In this paper, we formulate the TSC problem as a discounted cost Markov decision process (MDP) and apply multi-agent reinforcement learning (MARL) algorithms to obtain dynamic TSC policies. We model each traffic signal junction as an independent agent. An agent decides the signal duration of its phases in a round-robin (RR) manner using multi-agent Q-learning with either is an element of-greedy or UCB 3] based exploration strategies. It updates its Q-factors based on the cost feedback signal received from its neighbouring agents. This feedback signal can be easily constructed and is shown to be effective in minimizing the average delay of the vehicles in the network. We show through simulations over VISSIM that our algorithms perform significantly better than both the standard fixed signal timing (FST) algorithm and the saturation balancing (SAT) algorithm 15] over two real road networks.
Resumo:
This paper considers the problem of receive antenna selection (AS) in a multiple-antenna communication system having a single radio-frequency (RF) chain. The AS decisions are based on noisy channel estimates obtained using known pilot symbols embedded in the data packets. The goal here is to minimize the average packet error rate (PER) by exploiting the known temporal correlation of the channel. As the underlying channels are only partially observed using the pilot symbols, the problem of AS for PER minimization is cast into a partially observable Markov decision process (POMDP) framework. Under mild assumptions, the optimality of a myopic policy is established for the two-state channel case. Moreover, two heuristic AS schemes are proposed based on a weighted combination of the estimated channel states on the different antennas. These schemes utilize the continuous valued received pilot symbols to make the AS decisions, and are shown to offer performance comparable to the POMDP approach, which requires one to quantize the channel and observations to a finite set of states. The performance improvement offered by the POMDP solution and the proposed heuristic solutions relative to existing AS training-based approaches is illustrated using Monte Carlo simulations.
Resumo:
We prepared thin films composed of pure TiO2 or TiO2 with an Fe additive (at concentrations of 0.2-0.8 wt%) via a simple and cost effective sol gel process, and tested their antifungal properties (against Candida albicans (MTCC-1637), Candida tropicalis (MTCC-184), Candida parapsilosis (MTCC-2509), and Candida glabrata (MTCC-3019) and antibacterial properties (against Staphylococcus faecalis (NCIM-2604) Staphylococcus epidermidis (NCIM-2493), Staphylococcus aureus (NCIL-2122), and Bacillus subtilis (NCIM-2549)). The films were deposited on glass and Si substrates and subjected to annealing at 400 degrees C for 3 h in ambient air. The film structural and morphological properties were investigated by X-ray photoelectron spectroscopy profilometry and scanning electron microscopy, respectively. Antifungal and antibacterial tests were conducted using the drop test method. Among the species examined, Candida albicans (MTCC-1637), and Staphylococcus aureus (NCIL-2122) showed complete colony formation inhibition after exposure for 4 h for the TiO2 loaded with 0.8 wt% Fe thin films. These results indicate that increasing the Fe concentration increased the antimicrobial activity, with complete inhibition of colony formation after 4 h exposure.
Resumo:
This splitting techniques for MARKOV chains developed by NUMMELIN (1978a) and ATHREYA and NEY (1978b) are used to derive an imbedded renewal process in WOLD's point process with MARKOV-correlated intervals. This leads to a simple proof of renewal theorems for such processes. In particular, a key renewal theorem is proved, from which analogues to both BLACKWELL's and BREIMAN's forms of the renewal theorem can be deduced.
Resumo:
We address risk minimizing option pricing in a semi-Markov modulated market where the floating interest rate depends on a finite state semi-Markov process. The growth rate and the volatility of the stock also depend on the semi-Markov process. Using the Föllmer–Schweizer decomposition we find the locally risk minimizing price for European options and the corresponding hedging strategy. We develop suitable numerical methods for computing option prices.