328 resultados para Stochastic Approximation Algorithms
Resumo:
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.
Resumo:
Service systems are labor intensive. Further, the workload tends to vary greatly with time. Adapting the staffing levels to the workloads in such systems is nontrivial due to a large number of parameters and operational variations, but crucial for business objectives such as minimal labor inventory. One of the central challenges is to optimize the staffing while maintaining system steady-state and compliance to aggregate SLA constraints. We formulate this problem as a parametrized constrained Markov process and propose a novel stochastic optimization algorithm for solving it. Our algorithm is a multi-timescale stochastic approximation scheme that incorporates a SPSA based algorithm for ‘primal descent' and couples it with a ‘dual ascent' scheme for the Lagrange multipliers. We validate this optimization scheme on five real-life service systems and compare it with a state-of-the-art optimization tool-kit OptQuest. Being two orders of magnitude faster than OptQuest, our scheme is particularly suitable for adaptive labor staffing. Also, we observe that it guarantees convergence and finds better solutions than OptQuest in many cases.
Resumo:
We consider the problem of wireless channel allocation (whenever the channels are free) to multiple cognitive radio users in a Cognitive Radio Network (CRN) so as to satisfy their Quality of Service (QoS) requirements efficiently. The CRN base station may not know the channel states of all the users. The multiple channels are available at random times. In this setup Opportunistic Splitting can be an attractive solution. A disadvantage of this algorithm is that it requires the metrics of all users to be an independent, identically distributed sequence. However we use a recently generalized version of this algorithm in which the optimal parameters are learnt on-line through stochastic approximation and metrics can be Markov. We provide scheduling algorithms which maximize weighted-sum system throughput or are throughput or delay optimal. We also consider the scenario when some traffic streams are delay sensitive.
Resumo:
In this paper, we consider an intrusion detection application for Wireless Sensor Networks. We study the problem of scheduling the sleep times of the individual sensors, where the objective is to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous stateaction spaces, in a manner similar to Fuemmeler and Veeravalli (IEEE Trans Signal Process 56(5), 2091-2101, 2008). However, unlike their formulation, we consider infinite horizon discounted and average cost objectives as performance criteria. For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation. Feature-based representations and function approximation is necessary to handle the curse of dimensionality associated with the underlying POMDP. Our proposed algorithm incorporates a policy gradient update using a one-simulation simultaneous perturbation stochastic approximation estimate on the faster timescale, while the Q-value parameter (arising from a linear function approximation architecture for the Q-values) is updated in an on-policy temporal difference algorithm-like fashion on the slower timescale. The feature selection scheme employed in each of our algorithms manages the energy and tracking components in a manner that assists the search for the optimal sleep-scheduling policy. For the sake of comparison, in both discounted and average settings, we also develop a function approximation analogue of the Q-learning algorithm. This algorithm, unlike the two-timescale variant, does not possess theoretical convergence guarantees. Finally, we also adapt our algorithms to include a stochastic iterative estimation scheme for the intruder's mobility model and this is useful in settings where the latter is not known. Our simulation results on a synthetic 2-dimensional network setting suggest that our algorithms result in better tracking accuracy at the cost of only a few additional sensors, in comparison to a recent prior work.
Resumo:
We consider the problem of optimizing the workforce of a service system. Adapting the staffing levels in such systems is non-trivial due to large variations in workload and the large number of system parameters do not allow for a brute force search. Further, because these parameters change on a weekly basis, the optimization should not take longer than a few hours. Our aim is to find the optimum staffing levels from a discrete high-dimensional parameter set, that minimizes the long run average of the single-stage cost function, while adhering to the constraints relating to queue stability and service-level agreement (SLA) compliance. The single-stage cost function balances the conflicting objectives of utilizing workers better and attaining the target SLAs. We formulate this problem as a constrained parameterized Markov cost process parameterized by the (discrete) staffing levels. We propose novel simultaneous perturbation stochastic approximation (SPSA)-based algorithms for solving the above problem. The algorithms include both first-order as well as second-order methods and incorporate SPSA-based gradient/Hessian estimates for primal descent, while performing dual ascent for the Lagrange multipliers. Both algorithms are online and update the staffing levels in an incremental fashion. Further, they involve a certain generalized smooth projection operator, which is essential to project the continuous-valued worker parameter tuned by our algorithms onto the discrete set. The smoothness is necessary to ensure that the underlying transition dynamics of the constrained Markov cost process is itself smooth (as a function of the continuous-valued parameter): a critical requirement to prove the convergence of both algorithms. We validate our algorithms via performance simulations based on data from five real-life service systems. For the sake of comparison, we also implement a scatter search based algorithm using state-of-the-art optimization tool-kit OptQuest. From the experiments, we observe that both our algorithms converge empirically and consistently outperform OptQuest in most of the settings considered. This finding coupled with the computational advantage of our algorithms make them amenable for adaptive labor staffing in real-life service systems.
Resumo:
With the availability of a huge amount of video data on various sources, efficient video retrieval tools are increasingly in demand. Video being a multi-modal data, the perceptions of ``relevance'' between the user provided query video (in case of Query-By-Example type of video search) and retrieved video clips are subjective in nature. We present an efficient video retrieval method that takes user's feedback on the relevance of retrieved videos and iteratively reformulates the input query feature vectors (QFV) for improved video retrieval. The QFV reformulation is done by a simple, but powerful feature weight optimization method based on Simultaneous Perturbation Stochastic Approximation (SPSA) technique. A video retrieval system with video indexing, searching and relevance feedback (RF) phases is built for demonstrating the performance of the proposed method. The query and database videos are indexed using the conventional video features like color, texture, etc. However, we use the comprehensive and novel methods of feature representations, and a spatio-temporal distance measure to retrieve the top M videos that are similar to the query. In feedback phase, the user activated iterative on the previously retrieved videos is used to reformulate the QFV weights (measure of importance) that reflect the user's preference, automatically. It is our observation that a few iterations of such feedback are generally sufficient for retrieving the desired video clips. The novel application of SPSA based RF for user-oriented feature weights optimization makes the proposed method to be distinct from the existing ones. The experimental results show that the proposed RF based video retrieval exhibit good performance.
Resumo:
We consider the problem of estimating the optimal parameter trajectory over a finite time interval in a parameterized stochastic differential equation (SDE), and propose a simulation-based algorithm for this purpose. Towards this end, we consider a discretization of the SDE over finite time instants and reformulate the problem as one of finding an optimal parameter at each of these instants. A stochastic approximation algorithm based on the smoothed functional technique is adapted to this setting for finding the optimal parameter trajectory. A proof of convergence of the algorithm is presented and results of numerical experiments over two different settings are shown. The algorithm is seen to exhibit good performance. We also present extensions of our framework to the case of finding optimal parameterized feedback policies for controlled SDE and present numerical results in this scenario as well.
Resumo:
A unit cube in k dimensions (k-cube) is defined as the Cartesian product R-1 x R-2 x ... x R-k where R-i (for 1 <= i <= k) is a closed interval of the form [a(i), a(i) + 1] on the real line. A graph G on n nodes is said to be representable as the intersection of k-cubes (cube representation in k dimensions) if each vertex of C can be mapped to a k-cube such that two vertices are adjacent in G if and only if their corresponding k-cubes have a non-empty intersection. The cubicity of G denoted as cub(G) is the minimum k for which G can be represented as the intersection of k-cubes. An interesting aspect about cubicity is that many problems known to be NP-complete for general graphs have polynomial time deterministic algorithms or have good approximation ratios in graphs of low cubicity. In most of these algorithms, computing a low dimensional cube representation of the given graph is usually the first step. We give an O(bw . n) algorithm to compute the cube representation of a general graph G in bw + 1 dimensions given a bandwidth ordering of the vertices of G, where bw is the bandwidth of G. As a consequence, we get O(Delta) upper bounds on the cubicity of many well-known graph classes such as AT-free graphs, circular-arc graphs and cocomparability graphs which have O(Delta) bandwidth. Thus we have: 1. cub(G) <= 3 Delta - 1, if G is an AT-free graph. 2. cub(G) <= 2 Delta + 1, if G is a circular-arc graph. 3. cub(G) <= 2 Delta, if G is a cocomparability graph. Also for these graph classes, there axe constant factor approximation algorithms for bandwidth computation that generate orderings of vertices with O(Delta) width. We can thus generate the cube representation of such graphs in O(Delta) dimensions in polynomial time.
Resumo:
Pricing is an effective tool to control congestion and achieve quality of service (QoS) provisioning for multiple differentiated levels of service. In this paper, we consider the problem of pricing for congestion control in the case of a network of nodes under multiple service classes. Our work draws upon [1] and [2] in various ways. We use the Tirupati pricing scheme in conjunction with the stochastic approximation based adaptive pricing methodology for queue control (proposed in [1]) for minimizing network congestion. However, unlike the methodology of [1] where pricing for entire routes is directly considered, we consider prices for individual link-service grade tuples. Further, we adapt the methodology proposed in [21 for a single-node scenario to the case of a network of nodes, for evaluating performance in terms of price, revenue rate and disutility. We obtain considerable performance improvements using our approach over that in [1]. In particular, our approach exhibits a throughput improvement in the range of 54 to 80 percent in all cases studied (over all routes) while exhibiting a lower packet delay in the range of 26 to 38 percent over the scheme in [1].
Resumo:
This article proposes a three-timescale simulation based algorithm for solution of infinite horizon Markov Decision Processes (MDPs). We assume a finite state space and discounted cost criterion and adopt the value iteration approach. An approximation of the Dynamic Programming operator T is applied to the value function iterates. This 'approximate' operator is implemented using three timescales, the slowest of which updates the value function iterates. On the middle timescale we perform a gradient search over the feasible action set of each state using Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates, thus finding the minimizing action in T. On the fastest timescale, the 'critic' estimates, over which the gradient search is performed, are obtained. A sketch of convergence explaining the dynamics of the algorithm using associated ODEs is also presented. Numerical experiments on rate based flow control on a bottleneck node using a continuous-time queueing model are performed using the proposed algorithm. The results obtained are verified against classical value iteration where the feasible set is suitably discretized. Over such a discretized setting, a variant of the algorithm of [12] is compared and the proposed algorithm is found to converge faster.
Resumo:
Alopex is a correlation-based gradient-free optimization technique useful in many learning problems. However, there are no analytical results on the asymptotic behavior of this algorithm. This article presents a new version of Alopex that can be analyzed using techniques of two timescale stochastic approximation method. It is shown that the algorithm asymptotically behaves like a gradient-descent method, though it does not need (or estimate) any gradient information. It is also shown, through simulations, that the algorithm is quite effective.
Resumo:
In this paper, we consider the problem of selecting, for any given positive integer k, the top-k nodes in a social network, based on a certain measure appropriate for the social network. This problem is relevant in many settings such as analysis of co-authorship networks, diffusion of information, viral marketing, etc. However, in most situations, this problem turns out to be NP-hard. The existing approaches for solving this problem are based on approximation algorithms and assume that the objective function is sub-modular. In this paper, we propose a novel and intuitive algorithm based on the Shapley value, for efficiently computing an approximate solution to this problem. Our proposed algorithm does not use the sub-modularity of the underlying objective function and hence it is a general approach. We demonstrate the efficacy of the algorithm using a co-authorship data set from e-print arXiv (www.arxiv.org), having 8361 authors.
Resumo:
We study the problem of optimal bandwidth allocation in communication networks. We consider a queueing model with two queues to which traffic from different competing flows arrive. The queue length at the buffers is observed every T instants of time, on the basis of which a decision on the amount of bandwidth to be allocated to each buffer for the next T instants is made. We consider a class of closed-loop feedback policies for the system and use a twotimescale simultaneous perturbation stochastic approximation(SPSA) algorithm to find an optimal policy within the prescribed class. We study the performance of the proposed algorithm on a numerical setting. Our algorithm is found to exhibit good performance.
Resumo:
We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov decision processes with finite state and action spaces, with a discounted reward criterion. The algorithm is of the gradient ascent type and performs a search in the space of stationary randomized policies. The algorithm uses certain simultaneous deterministic perturbation stochastic approximation (SDPSA) gradient estimates for enhanced performance. We show an application of our algorithm on a problem of mortgage refinancing. Our algorithm obtains the optimal refinancing strategies in a computationally efficient manner