313 resultados para function approximation

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose, for the first time, a reinforcement learning (RL) algorithm with function approximation for traffic signal control. Our algorithm incorporates state-action features and is easily implementable in high-dimensional settings. Prior work, e. g., the work of Abdulhai et al., on the application of RL to traffic signal control requires full-state representations and cannot be implemented, even in moderate-sized road networks, because the computational complexity exponentially grows in the numbers of lanes and junctions. We tackle this problem of the curse of dimensionality by effectively using feature-based state representations that use a broad characterization of the level of congestion as low, medium, or high. One advantage of our algorithm is that, unlike prior work based on RL, it does not require precise information on queue lengths and elapsed times at each lane but instead works with the aforementioned described features. The number of features that our algorithm requires is linear to the number of signaled lanes, thereby leading to several orders of magnitude reduction in the computational complexity. We perform implementations of our algorithm on various settings and show performance comparisons with other algorithms in the literature, including the works of Abdulhai et al. and Cools et al., as well as the fixed-timing and the longest queue algorithms. For comparison, we also develop an RL algorithm that uses full-state representation and incorporates prioritization of traffic, unlike the work of Abdulhai et al. We observe that our algorithm outperforms all the other algorithms on all the road network settings that we consider.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, wave propagation in multi-walled carbon nanotubes (MWNTs) are studied by modeling them as continuum multiple shell coupled through van der Waals force of interaction. The displacements, namely, axial, radial and circumferential displacements vary along the circumferential direction. The wave propagation are simulated using the wavelet based spectral finite element (WSFE) method. This technique involves Daubechies scaling function approximation in time and spectral element approach. The WSFE Method allows the study of wave properties in both time and frequency domains. This is in contrast to the conventional Fourier transform based analysis which are restricted to frequency domain analysis. Here, first, the wavenumbers and wave speeds of carbon nanotubes (CNTs) are Studied to obtain the characteristics of the waves. These group speeds have been compared with those reported in literature. Next, the natural frequencies of a single-walled carbon nanotube (SWNT) are studied for different values of the radius. The frequencies of the first five modes vary linearly with the radius of the SWNT. Finally, the time domain responses are simulated for SWNT and three-walled carbon nanotubes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Based on the conclusions drawn in the bijective transformation between possibility and probability, a method is proposed to estimate the fuzzy membership function for pattern recognition purposes. A rational function approximation to the probability density function is obtained from the histogram of a finite (and sometimes very small) number of samples. This function is normalized such that the highest ordinate is one. The parameters representing the rational function are used for classifying the pattern samples based on a max-min decision rule. The method is illustrated with examples.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and functi approximation ideas,and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function-approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper investigates the use of Genetic Programming (GP) to create an approximate model for the non-linear relationship between flexural stiffness, length, mass per unit length and rotation speed associated with rotating beams and their natural frequencies. GP, a relatively new form of artificial intelligence, is derived from the Darwinian concept of evolution and genetics and it creates computer programs to solve problems by manipulating their tree structures. GP predicts the size and structural complexity of the empirical model by minimizing the mean square error at the specified points of input-output relationship dataset. This dataset is generated using a finite element model. The validity of the GP-generated model is tested by comparing the natural frequencies at training and at additional input data points. It is found that by using a non-dimensional stiffness, it is possible to get simple and accurate function approximation for the natural frequency. This function approximation model is then used to study the relationships between natural frequency and various influencing parameters for uniform and tapered beams. The relations obtained with GP model agree well with FEM results and can be used for preliminary design and structural optimization studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we present a growing and pruning radial basis function based no-reference (NR) image quality model for JPEG-coded images. The quality of the images are estimated without referring to their original images. The features for predicting the perceived image quality are extracted by considering key human visual sensitivity factors such as edge amplitude, edge length, background activity and background luminance. Image quality estimation involves computation of functional relationship between HVS features and subjective test scores. Here, the problem of quality estimation is transformed to a function approximation problem and solved using GAP-RBF network. GAP-RBF network uses sequential learning algorithm to approximate the functional relationship. The computational complexity and memory requirement are less in GAP-RBF algorithm compared to other batch learning algorithms. Also, the GAP-RBF algorithm finds a compact image quality model and does not require retraining when the new image samples are presented. Experimental results prove that the GAP-RBF image quality model does emulate the mean opinion score (MOS). The subjective test results of the proposed metric are compared with JPEG no-reference image quality index as well as full-reference structural similarity image quality index and it is observed to outperform both.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we give a generalization of a result by Borkar and Meyn (2000) 1], on the stability and convergence of synchronous-update stochastic approximation algorithms, to the case of asynchronous stochastic approximations with delays. We then describe an interesting application of the result to asynchronous distributed temporal difference (TD) learning with function approximation and delays. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Accurate system planning and performance evaluation requires knowledge of the joint impact of scheduling, interference, and fading. However, current analyses either require costly numerical simulations or make simplifying assumptions that limit the applicability of the results. In this paper, we derive analytical expressions for the spectral efficiency of cellular systems that use either the channel-unaware but fair round robin scheduler or the greedy, channel-aware but unfair maximum signal to interference ratio scheduler. As is the case in real deployments, non-identical co-channel interference at each user, both Rayleigh fading and lognormal shadowing, and limited modulation constellation sizes are accounted for in the analysis. We show that using a simple moment generating function-based lognormal approximation technique and an accurate Gaussian-Q function approximation leads to results that match simulations well. These results are more accurate than erstwhile results that instead used the moment-matching Fenton-Wilkinson approximation method and bounds on the Q function. The spectral efficiency of cellular systems is strongly influenced by the channel scheduler and the small constellation size that is typically used in third generation cellular systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic rein- forcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their com- patibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further re- duce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal differ- ence learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Designing and optimizing high performance microprocessors is an increasingly difficult task due to the size and complexity of the processor design space, high cost of detailed simulation and several constraints that a processor design must satisfy. In this paper, we propose the use of empirical non-linear modeling techniques to assist processor architects in making design decisions and resolving complex trade-offs. We propose a procedure for building accurate non-linear models that consists of the following steps: (i) selection of a small set of representative design points spread across processor design space using latin hypercube sampling, (ii) obtaining performance measures at the selected design points using detailed simulation, (iii) building non-linear models for performance using the function approximation capabilities of radial basis function networks, and (iv) validating the models using an independently and randomly generated set of design points. We evaluate our model building procedure by constructing non-linear performance models for programs from the SPEC CPU2000 benchmark suite with a microarchitectural design space that consists of 9 key parameters. Our results show that the models, built using a relatively small number of simulations, achieve high prediction accuracy (only 2.8% error in CPI estimates on average) across a large processor design space. Our models can potentially replace detailed simulation for common tasks such as the analysis of key microarchitectural trends or searches for optimal processor design points.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, a model for composite beam with embedded de-lamination is developed using the wavelet based spectral finite element (WSFE) method particularly for damage detection using wave propagation analysis. The simulated responses are used as surrogate experimental results for the inverse problem of detection of damage using wavelet filtering. The WSFE technique is very similar to the fast fourier transform (FFT) based spectral finite element (FSFE) except that it uses compactly supported Daubechies scaling function approximation in time. Unlike FSFE formulation with periodicity assumption, the wavelet-based method allows imposition of initial values and thus is free from wrap around problems. This helps in analysis of finite length undamped structures, where the FSFE method fails to simulate accurate response. First, numerical experiments are performed to study the effect of de-lamination on the wave propagation characteristics. The responses are simulated for different de-lamination configurations for both broad-band and narrow-band excitations. Next, simulated responses are used for damage detection using wavelet analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.