50 resultados para decision authority
Resumo:
We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.
Resumo:
This paper considers antenna selection (AS) at a receiver equipped with multiple antenna elements but only a single radio frequency chain for packet reception. As information about the channel state is acquired using training symbols (pilots), the receiver makes its AS decisions based on noisy channel estimates. Additional information that can be exploited for AS includes the time-correlation of the wireless channel and the results of the link-layer error checks upon receiving the data packets. In this scenario, the task of the receiver is to sequentially select (a) the pilot symbol allocation, i.e., how to distribute the available pilot symbols among the antenna elements, for channel estimation on each of the receive antennas; and (b) the antenna to be used for data packet reception. The goal is to maximize the expected throughput, based on the past history of allocation and selection decisions, and the corresponding noisy channel estimates and error check results. Since the channel state is only partially observed through the noisy pilots and the error checks, the joint problem of pilot allocation and AS is modeled as a partially observed Markov decision process (POMDP). The solution to the POMDP yields the policy that maximizes the long-term expected throughput. Using the Finite State Markov Chain (FSMC) model for the wireless channel, the performance of the POMDP solution is compared with that of other existing schemes, and it is illustrated through numerical evaluation that the POMDP solution significantly outperforms them.
Resumo:
This paper addresses the problem of finding optimal power control policies for wireless energy harvesting sensor (EHS) nodes with automatic repeat request (ARQ)-based packet transmissions. The EHS harvests energy from the environment according to a Bernoulli process; and it is required to operate within the constraint of energy neutrality. The EHS obtains partial channel state information (CSI) at the transmitter through the link-layer ARQ protocol, via the ACK/NACK feedback messages, and uses it to adapt the transmission power for the packet (re)transmission attempts. The underlying wireless fading channel is modeled as a finite state Markov chain with known transition probabilities. Thus, the goal of the power management policy is to determine the best power setting for the current packet transmission attempt, so as to maximize a long-run expected reward such as the expected outage probability. The problem is addressed in a decision-theoretic framework by casting it as a partially observable Markov decision process (POMDP). Due to the large size of the state-space, the exact solution to the POMDP is computationally expensive. Hence, two popular approximate solutions are considered, which yield good power management policies for the transmission attempts. Monte Carlo simulation results illustrate the efficacy of the approach and show that the approximate solutions significantly outperform conventional approaches.
Resumo:
H. 264/advanced video coding surveillance video encoders use the Skip mode specified by the standard to reduce bandwidth. They also use multiple frames as reference for motion-compensated prediction. In this paper, we propose two techniques to reduce the bandwidth and computational cost of static camera surveillance video encoders without affecting detection and recognition performance. A spatial sampler is proposed to sample pixels that are segmented using a Gaussian mixture model. Modified weight updates are derived for the parameters of the mixture model to reduce floating point computations. A storage pattern of the parameters in memory is also modified to improve cache performance. Skip selection is performed using the segmentation results of the sampled pixels. The second contribution is a low computational cost algorithm to choose the reference frames. The proposed reference frame selection algorithm reduces the cost of coding uncovered background regions. We also study the number of reference frames required to achieve good coding efficiency. Distortion over foreground pixels is measured to quantify the performance of the proposed techniques. Experimental results show bit rate savings of up to 94.5% over methods proposed in literature on video surveillance data sets. The proposed techniques also provide up to 74.5% reduction in compression complexity without increasing the distortion over the foreground regions in the video sequence.
Resumo:
Wildlife conservation in human-dominated landscapes requires that we understand how animals, when making habitat-use decisions, obtain diverse and dynamically occurring resources while avoiding risks, induced by both natural predators and anthropogenic threats. Little is known about the underlying processes that enable wild animals to persist in densely populated human-dominated landscapes, particularly in developing countries. In a complex, semi-arid, fragmented, human-dominated agricultural landscape, we analyzed the habitat-use of blackbuck, a large herbivore endemic to the Indian sub-continent. We hypothesized that blackbuck would show flexible habitat-use behaviour and be risk averse when resource quality in the landscape is high, and less sensitive to risk otherwise. Overall, blackbuck appeared to be strongly influenced by human activity and they offset risks by using small protected patches (similar to 3 km(2)) when they could afford to do so. Blackbuck habitat use varied dynamically corresponding with seasonally-changing levels of resources and risks, with protected habitats registering maximum use. The findings show that human activities can strongly influence and perhaps limit ungulate habitat-use and behaviour, but spatial heterogeneity in risk, particularly the presence of refuges, can allow ungulates to persist in landscapes with high human and livestock densities.