328 resultados para Stochastic Approximation Algorithms
Resumo:
The problem of finding optimal parameterized feedback policies for dynamic bandwidth allocation in communication networks is studied. We consider a queueing model with two queues to which traffic from different competing flows arrive. The queue length at the buffers is observed every T instants of time, on the basis of which a decision on the amount of bandwidth to be allocated to each buffer for the next T instants is made. We consider two different classes of multilevel closed-loop feedback policies for the system and use a two-timescale simultaneous perturbation stochastic approximation (SPSA) algorithm to find optimal policies within each prescribed class. We study the performance of the proposed algorithm on a numerical setting and show performance comparisons of the two optimal multilevel closedloop policies with optimal open loop policies. We observe that closed loop policies of Class B that tune parameters for both the queues and do not have the constraint that the entire bandwidth be used at each instant exhibit the best results overall as they offer greater flexibility in parameter tuning. Index Terms — Resource allocation, dynamic bandwidth allocation in communication networks, two-timescale SPSA algorithm, optimal parameterized policies. I.
Resumo:
For a fixed positive integer k, a k-tuple total dominating set of a graph G = (V. E) is a subset T D-k of V such that every vertex in V is adjacent to at least k vertices of T Dk. In minimum k-tuple total dominating set problem (MIN k-TUPLE TOTAL DOM SET), it is required to find a k-tuple total dominating set of minimum cardinality and DECIDE MIN k-TUPLE TOTAL DOM SET is the decision version of MIN k-TUPLE TOTAL DOM SET problem. In this paper, we show that DECIDE MIN k-TUPLE TOTAL DOM SET is NP-complete for split graphs, doubly chordal graphs and bipartite graphs. For chordal bipartite graphs, we show that MIN k-TUPLE TOTAL DOM SET can be solved in polynomial time. We also propose some hardness results and approximation algorithms for MIN k-TUPLE TOTAL DOM SET problem. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
The Lovasz θ function of a graph, is a fundamental tool in combinatorial optimization and approximation algorithms. Computing θ involves solving a SDP and is extremely expensive even for moderately sized graphs. In this paper we establish that the Lovasz θ function is equivalent to a kernel learning problem related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM−θ graphs, on which the Lovasz θ function can be approximated well by a one-class SVM. This leads to a novel use of SVM techniques to solve algorithmic problems in large graphs e.g. identifying a planted clique of size Θ(n√) in a random graph G(n,12). A classic approach for this problem involves computing the θ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM−θ graph, and as a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. Further, we introduce the notion of a ''common orthogonal labeling'' which extends the notion of a ''orthogonal labelling of a single graph (used in defining the θ function) to multiple graphs. The problem of finding the optimal common orthogonal labelling is cast as a Multiple Kernel Learning problem and is used to identify a large common dense region in multiple graphs. The proposed algorithm achieves an order of magnitude scalability compared to the state of the art.
Resumo:
We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.
Resumo:
Rainbow connection number, rc(G), of a connected graph G is the minimum number of colors needed to color its edges so that every pair of vertices is connected by at least one path in which no two edges are colored the same (note that the coloring need not be proper). In this paper we study the rainbow connection number with respect to three important graph product operations (namely the Cartesian product, the lexicographic product and the strong product) and the operation of taking the power of a graph. In this direction, we show that if G is a graph obtained by applying any of the operations mentioned above on non-trivial graphs, then rc(G) a parts per thousand currency sign 2r(G) + c, where r(G) denotes the radius of G and . In general the rainbow connection number of a bridgeless graph can be as high as the square of its radius 1]. This is an attempt to identify some graph classes which have rainbow connection number very close to the obvious lower bound of diameter (and thus the radius). The bounds reported are tight up to additive constants. The proofs are constructive and hence yield polynomial time -factor approximation algorithms.
Resumo:
We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and functi approximation ideas,and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function-approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.
Resumo:
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic rein- forcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their com- patibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further re- duce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal differ- ence learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.
Resumo:
The q-Gaussian distribution results from maximizing certain generalizations of Shannon entropy under some constraints. The importance of q-Gaussian distributions stems from the fact that they exhibit power-law behavior, and also generalize Gaussian distributions. In this paper, we propose a Smoothed Functional (SF) scheme for gradient estimation using q-Gaussian distribution, and also propose an algorithm for optimization based on the above scheme. Convergence results of the algorithm are presented. Performance of the proposed algorithm is shown by simulation results on a queuing model.
Resumo:
We develop iterative diffraction tomography algorithms, which are similar to the distorted Born algorithms, for inverting scattered intensity data. Within the Born approximation, the unknown scattered field is expressed as a multiplicative perturbation to the incident field. With this, the forward equation becomes stable, which helps us compute nearly oscillation-free solutions that have immediate bearing on the accuracy of the Jacobian computed for use in a deterministic Gauss-Newton (GN) reconstruction. However, since the data are inherently noisy and the sensitivity of measurement to refractive index away from the detectors is poor, we report a derivative-free evolutionary stochastic scheme, providing strictly additive updates in order to bridge the measurement-prediction misfit, to arrive at the refractive index distribution from intensity transport data. The superiority of the stochastic algorithm over the GN scheme for similar settings is demonstrated by the reconstruction of the refractive index profile from simulated and experimentally acquired intensity data. (C) 2014 Optical Society of America
Resumo:
We present the first q-Gaussian smoothed functional (SF) estimator of the Hessian and the first Newton-based stochastic optimization algorithm that estimates both the Hessian and the gradient of the objective function using q-Gaussian perturbations. Our algorithm requires only two system simulations (regardless of the parameter dimension) and estimates both the gradient and the Hessian at each update epoch using these. We also present a proof of convergence of the proposed algorithm. In a related recent work (Ghoshdastidar, Dukkipati, & Bhatnagar, 2014), we presented gradient SF algorithms based on the q-Gaussian perturbations. Our work extends prior work on SF algorithms by generalizing the class of perturbation distributions as most distributions reported in the literature for which SF algorithms are known to work turn out to be special cases of the q-Gaussian distribution. Besides studying the convergence properties of our algorithm analytically, we also show the results of numerical simulations on a model of a queuing network, that illustrate the significance of the proposed method. In particular, we observe that our algorithm performs better in most cases, over a wide range of q-values, in comparison to Newton SF algorithms with the Gaussian and Cauchy perturbations, as well as the gradient q-Gaussian SF algorithms. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
Doppler weather radars with fast scanning rates must estimate spectral moments based on a small number of echo samples. This paper concerns the estimation of mean Doppler velocity in a coherent radar using a short complex time series. Specific results are presented based on 16 samples. A wide range of signal-to-noise ratios are considered, and attention is given to ease of implementation. It is shown that FFT estimators fare poorly in low SNR and/or high spectrum-width situations. Several variants of a vector pulse-pair processor are postulated and an algorithm is developed for the resolution of phase angle ambiguity. This processor is found to be better than conventional processors at very low SNR values. A feasible approximation to the maximum entropy estimator is derived as well as a technique utilizing the maximization of the periodogram. It is found that a vector pulse-pair processor operating with four lags for clear air observation and a single lag (pulse-pair mode) for storm observation may be a good way to estimate Doppler velocities over the entire gamut of weather phenomena.
Resumo:
Fractional-order derivatives appear in various engineering applications including models for viscoelastic damping. Damping behavior of materials, if modeled using linear, constant coefficient differential equations, cannot include the long memory that fractional-order derivatives require. However, sufficiently great rnicrostructural disorder can lead, statistically, to macroscopic behavior well approximated by fractional order derivatives. The idea has appeared in the physics literature, but may interest an engineering audience. This idea in turn leads to an infinite-dimensional system without memory; a routine Galerkin projection on that infinite-dimensional system leads to a finite dimensional system of ordinary differential equations (ODEs) (integer order) that matches the fractional-order behavior over user-specifiable, but finite, frequency ranges. For extreme frequencies (small or large), the approximation is poor. This is unavoidable, and users interested in such extremes or in the fundamental aspects of true fractional derivatives must take note of it. However, mismatch in extreme frequencies outside the range of interest for a particular model of a real material may have little engineering impact.
Resumo:
We provide a survey of some of our recent results ([9], [13], [4], [6], [7]) on the analytical performance modeling of IEEE 802.11 wireless local area networks (WLANs). We first present extensions of the decoupling approach of Bianchi ([1]) to the saturation analysis of IEEE 802.11e networks with multiple traffic classes. We have found that even when analysing WLANs with unsaturated nodes the following state dependent service model works well: when a certain set of nodes is nonempty, their channel attempt behaviour is obtained from the corresponding fixed point analysis of the saturated system. We will present our experiences in using this approximation to model multimedia traffic over an IEEE 802.11e network using the enhanced DCF channel access (EDCA) mechanism. We have found that we can model TCP controlled file transfers, VoIP packet telephony, and streaming video in the IEEE802.11e setting by this simple approximation.