27 resultados para buying decision process

em Indian Institute of Science - Bangalore - Índia


Relevância:

90.00% 90.00%

Publicador:

Resumo:

We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We introduce and study a class of non-stationary semi-Markov decision processes on a finite horizon. By constructing an equivalent Markov decision process, we establish the existence of a piecewise open loop relaxed control which is optimal for the finite horizon problem.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper considers antenna selection (AS) at a receiver equipped with multiple antenna elements but only a single radio frequency chain for packet reception. As information about the channel state is acquired using training symbols (pilots), the receiver makes its AS decisions based on noisy channel estimates. Additional information that can be exploited for AS includes the time-correlation of the wireless channel and the results of the link-layer error checks upon receiving the data packets. In this scenario, the task of the receiver is to sequentially select (a) the pilot symbol allocation, i.e., how to distribute the available pilot symbols among the antenna elements, for channel estimation on each of the receive antennas; and (b) the antenna to be used for data packet reception. The goal is to maximize the expected throughput, based on the past history of allocation and selection decisions, and the corresponding noisy channel estimates and error check results. Since the channel state is only partially observed through the noisy pilots and the error checks, the joint problem of pilot allocation and AS is modeled as a partially observed Markov decision process (POMDP). The solution to the POMDP yields the policy that maximizes the long-term expected throughput. Using the Finite State Markov Chain (FSMC) model for the wireless channel, the performance of the POMDP solution is compared with that of other existing schemes, and it is illustrated through numerical evaluation that the POMDP solution significantly outperforms them.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper addresses the problem of finding optimal power control policies for wireless energy harvesting sensor (EHS) nodes with automatic repeat request (ARQ)-based packet transmissions. The EHS harvests energy from the environment according to a Bernoulli process; and it is required to operate within the constraint of energy neutrality. The EHS obtains partial channel state information (CSI) at the transmitter through the link-layer ARQ protocol, via the ACK/NACK feedback messages, and uses it to adapt the transmission power for the packet (re)transmission attempts. The underlying wireless fading channel is modeled as a finite state Markov chain with known transition probabilities. Thus, the goal of the power management policy is to determine the best power setting for the current packet transmission attempt, so as to maximize a long-run expected reward such as the expected outage probability. The problem is addressed in a decision-theoretic framework by casting it as a partially observable Markov decision process (POMDP). Due to the large size of the state-space, the exact solution to the POMDP is computationally expensive. Hence, two popular approximate solutions are considered, which yield good power management policies for the transmission attempts. Monte Carlo simulation results illustrate the efficacy of the approach and show that the approximate solutions significantly outperform conventional approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider the problem of quickest detection of an intrusion using a sensor network, keeping only a minimal number of sensors active. By using a minimal number of sensor devices, we ensure that the energy expenditure for sensing, computation and communication is minimized (and the lifetime of the network is maximized). We model the intrusion detection (or change detection) problem as a Markov decision process (MDP). Based on the theory of MDP, we develop the following closed loop sleep/wake scheduling algorithms: (1) optimal control of Mk+1, the number of sensors in the wake state in time slot k + 1, (2) optimal control of qk+1, the probability of a sensor in the wake state in time slot k + 1, and an open loop sleep/wake scheduling algorithm which (3) computes q, the optimal probability of a sensor in the wake state (which does not vary with time), based on the sensor observations obtained until time slot k. Our results show that an optimum closed loop control on Mk+1 significantly decreases the cost compared to keeping any number of sensors active all the time. Also, among the three algorithms described, we observe that the total cost is minimum for the optimum control on Mk+1 and is maximum for the optimum open loop control on q.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider a wireless sensor network whose main function is to detect certain infrequent alarm events, and to forward alarm packets to a base station, using geographical forwarding. The nodes know their locations, and they sleep-wake cycle, waking up periodically but not synchronously. In this situation, when a node has a packet to forward to the sink, there is a trade-off between how long this node waits for a suitable neighbor to wake up and the progress the packet makes towards the sink once it is forwarded to this neighbor. Hence, in choosing a relay node, we consider the problem of minimizing average delay subject to a constraint on the average progress. By constraint relaxation, we formulate this next hop relay selection problem as a Markov decision process (MDP). The exact optimal solution (BF (Best Forward)) can be found, but is computationally intensive. Next, we consider a mathematically simplified model for which the optimal policy (SF (Simplified Forward)) turns out to be a simple one-step-look-ahead rule. Simulations show that SF is very close in performance to BF, even for reasonably small node density. We then study the end-to-end performance of SF in comparison with two extremal policies: Max Forward (MF) and First Forward (FF), and an end-to-end delay minimising policy proposed by Kim et al. 1]. We find that, with appropriate choice of one hop average progress constraint, SF can be tuned to provide a favorable trade-off between end-to-end packet delay and the number of hops in the forwarding path.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider the problem of quickest detection of an intrusion using a sensor network, keeping only a minimal number of sensors active. By using a minimal number of sensor devices,we ensure that the energy expenditure for sensing, computation and communication is minimized (and the lifetime of the network is maximized). We model the intrusion detection (or change detection) problem as a Markov decision process (MDP). Based on the theory of MDP, we develop the following closed loop sleep/wake scheduling algorithms: 1) optimal control of Mk+1, the number of sensors in the wake state in time slot k + 1, 2) optimal control of qk+1, the probability of a sensor in the wake state in time slot k + 1, and an open loop sleep/wake scheduling algorithm which 3) computes q, the optimal probability of a sensor in the wake state (which does not vary with time),based on the sensor observations obtained until time slot k.Our results show that an optimum closed loop control onMk+1 significantly decreases the cost compared to keeping any number of sensors active all the time. Also, among the three algorithms described, we observe that the total cost is minimum for the optimum control on Mk+1 and is maximum for the optimum open loop control on q.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider a small extent sensor network for event detection, in which nodes periodically take samples and then contend over a random access network to transmit their measurement packets to the fusion center. We consider two procedures at the fusion center for processing the measurements. The Bayesian setting, is assumed, that is, the fusion center has a prior distribution on the change time. In the first procedure, the decision algorithm at the fusion center is network-oblivious and makes a decision only when a complete vector of measurements taken at a sampling instant is available. In the second procedure, the decision algorithm at the fusion center is network-aware and processes measurements as they arrive, but in a time-causal order. In this case, the decision statistic depends on the network delays, whereas in the network-oblivious case, the decision statistic does not. This yields a Bayesian change-detection problem with a trade-off between the random network delay and the decision delay that is, a higher sampling rate reduces the decision delay but increases the random access delay. Under periodic sampling, in the network-oblivious case, the structure of the optimal stopping rule is the same as that without the network, and the optimal change detection delay decouples into the network delay and the optimal decision delay without the network. In the network-aware case, the optimal stopping problem is analyzed as a partially observable Markov decision process, in which the states of the queues and delays in the network need to be maintained. A sufficient decision statistic is the network state and the posterior probability of change having occurred, given the measurements received and the state of the network. The optimal regimes are studied using simulation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We study optimal control of Markov processes with age-dependent transition rates. The control policy is chosen continuously over time based on the state of the process and its age. We study infinite horizon discounted cost and infinite horizon average cost problems. Our approach is via the construction of an equivalent semi-Markov decision process. We characterise the value function and optimal controls for both discounted and average cost cases.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Our work is motivated by geographical forwarding of sporadic alarm packets to a base station in a wireless sensor network (WSN), where the nodes are sleep-wake cycling periodically and asynchronously. We seek to develop local forwarding algorithms that can be tuned so as to tradeoff the end-to-end delay against a total cost, such as the hop count or total energy. Our approach is to solve, at each forwarding node enroute to the sink, the local forwarding problem of minimizing one-hop waiting delay subject to a lower bound constraint on a suitable reward offered by the next-hop relay; the constraint serves to tune the tradeoff. The reward metric used for the local problem is based on the end-to-end total cost objective (for instance, when the total cost is hop count, we choose to use the progress toward sink made by a relay as the reward). The forwarding node, to begin with, is uncertain about the number of relays, their wake-up times, and the reward values, but knows the probability distributions of these quantities. At each relay wake-up instant, when a relay reveals its reward value, the forwarding node's problem is to forward the packet or to wait for further relays to wake-up. In terms of the operations research literature, our work can be considered as a variant of the asset selling problem. We formulate our local forwarding problem as a partially observable Markov decision process (POMDP) and obtain inner and outer bounds for the optimal policy. Motivated by the computational complexity involved in the policies derived out of these bounds, we formulate an alternate simplified model, the optimal policy for which is a simple threshold rule. We provide simulation results to compare the performance of the inner and outer bound policies against the simple policy, and also against the optimal policy when the source knows the exact number of relays. Observing the good performance and the ease of implementation of the simple policy, we apply it to our motivating problem, i.e., local geographical routing of sporadic alarm packets in a large WSN. We compare the end-to-end performance (i.e., average total delay and average total cost) obtained by the simple policy, when used for local geographical forwarding, against that obtained by the globally optimal forwarding algorithm proposed by Kim et al. 1].

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The assignment of tasks to multiple resources becomes an interesting game theoretic problem, when both the task owner and the resources are strategic. In the classical, nonstrategic setting, where the states of the tasks and resources are observable by the controller, this problem is that of finding an optimal policy for a Markov decision process (MDP). When the states are held by strategic agents, the problem of an efficient task allocation extends beyond that of solving an MDP and becomes that of designing a mechanism. Motivated by this fact, we propose a general mechanism which decides on an allocation rule for the tasks and resources and a payment rule to incentivize agents' participation and truthful reports. In contrast to related dynamic strategic control problems studied in recent literature, the problem studied here has interdependent values: the benefit of an allocation to the task owner is not simply a function of the characteristics of the task itself and the allocation, but also of the state of the resources. We introduce a dynamic extension of Mezzetti's two phase mechanism for interdependent valuations. In this changed setting, the proposed dynamic mechanism is efficient, within period ex-post incentive compatible, and within period ex-post individually rational.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper addresses the problem of finding outage-optimal power control policies for wireless energy harvesting sensor (EHS) nodes with automatic repeat request (ARQ)-based packet transmissions. The power control policy of the EHS specifies the transmission power for each packet transmission attempt, based on all the information available at the EHS. In particular, the acknowledgement (ACK) or negative acknowledgement (NACK) messages received provide the EHS with partial information about the channel state. We solve the problem of finding an optimal power control policy by casting it as a partially observable Markov decision process (POMDP). We study the structure of the optimal power policy in two ways. First, for the special case of binary power levels at the EHS, we show that the optimal policy for the underlying Markov decision process (MDP) when the channel state is observable is a threshold policy in the battery state. Second, we benchmark the performance of the EHS by rigorously analyzing the outage probability of a general fixed-power transmission scheme, where the EHS uses a predetermined power level at each slot within the frame. Monte Carlo simulation results illustrate the performance of the POMDP approach and verify the accuracy of the analysis. They also show that the POMDP solutions can significantly outperform conventional ad hoc approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We study the problem of optimal sequential (''as-you-go'') deployment of wireless relay nodes, as a person walks along a line of random length (with a known distribution). The objective is to create an impromptu multihop wireless network for connecting a packet source to be placed at the end of the line with a sink node located at the starting point, to operate in the light traffic regime. In walking from the sink towards the source, at every step, measurements yield the transmit powers required to establish links to one or more previously placed nodes. Based on these measurements, at every step, a decision is made to place a relay node, the overall system objective being to minimize a linear combination of the expected sum power (or the expected maximum power) required to deliver a packet from the source to the sink node and the expected number of relay nodes deployed. For each of these two objectives, two different relay selection strategies are considered: (i) each relay communicates with the sink via its immediate previous relay, (ii) the communication path can skip some of the deployed relays. With appropriate modeling assumptions, we formulate each of these problems as a Markov decision process (MDP). We provide the optimal policy structures for all these cases, and provide illustrations of the policies and their performance, via numerical results, for some typical parameters.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Our work is motivated by impromptu (or ``as-you-go'') deployment of wireless relay nodes along a path, a need that arises in many situations. In this paper, the path is modeled as starting at the origin (where there is the data sink, e.g., the control center), and evolving randomly over a lattice in the positive quadrant. A person walks along the path deploying relay nodes as he goes. At each step, the path can, randomly, either continue in the same direction or take a turn, or come to an end, at which point a data source (e.g., a sensor) has to be placed, that will send packets to the data sink. A decision has to be made at each step whether or not to place a wireless relay node. Assuming that the packet generation rate by the source is very low, and simple link-by-link scheduling, we consider the problem of sequential relay placement so as to minimize the expectation of an end-to-end cost metric (a linear combination of the sum of convex hop costs and the number of relays placed). This impromptu relay placement problem is formulated as a total cost Markov decision process. First, we derive the optimal policy in terms of an optimal placement set and show that this set is characterized by a boundary (with respect to the position of the last placed relay) beyond which it is optimal to place the next relay. Next, based on a simpler one-step-look-ahead characterization of the optimal policy, we propose an algorithm which is proved to converge to the optimal placement set in a finite number of steps and which is faster than value iteration. We show by simulations that the distance threshold based heuristic, usually assumed in the literature, is close to the optimal, provided that the threshold distance is carefully chosen. (C) 2014 Elsevier B.V. All rights reserved.