982 resultados para function approximation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose, for the first time, a reinforcement learning (RL) algorithm with function approximation for traffic signal control. Our algorithm incorporates state-action features and is easily implementable in high-dimensional settings. Prior work, e. g., the work of Abdulhai et al., on the application of RL to traffic signal control requires full-state representations and cannot be implemented, even in moderate-sized road networks, because the computational complexity exponentially grows in the numbers of lanes and junctions. We tackle this problem of the curse of dimensionality by effectively using feature-based state representations that use a broad characterization of the level of congestion as low, medium, or high. One advantage of our algorithm is that, unlike prior work based on RL, it does not require precise information on queue lengths and elapsed times at each lane but instead works with the aforementioned described features. The number of features that our algorithm requires is linear to the number of signaled lanes, thereby leading to several orders of magnitude reduction in the computational complexity. We perform implementations of our algorithm on various settings and show performance comparisons with other algorithms in the literature, including the works of Abdulhai et al. and Cools et al., as well as the fixed-timing and the longest queue algorithms. For comparison, we also develop an RL algorithm that uses full-state representation and incorporates prioritization of traffic, unlike the work of Abdulhai et al. We observe that our algorithm outperforms all the other algorithms on all the road network settings that we consider.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The computation of a piecewise smooth function that approximates a finite set of data points may be decomposed into two decoupled tasks: first, the computation of the locally smooth models, and hence, the segmentation of the data into classes that consist on the sets of points best approximated by each model, and second, the computation of the normalized discriminant functions for each induced class. The approximating function may then be computed as the optimal estimator with respect to this measure field. We give an efficient procedure for effecting both computations, and for the determination of the optimal number of components.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we introduce a new Wiener system modeling approach for memory high power amplifiers in communication systems using observational input/output data. By assuming that the nonlinearity in the Wiener model is mainly dependent on the input signal amplitude, the complex valued nonlinear static function is represented by two real valued B-spline curves, one for the amplitude distortion and another for the phase shift, respectively. The Gauss-Newton algorithm is applied for the parameter estimation, which incorporates the De Boor algorithm, including both the B-spline curve and the first order derivatives recursion. An illustrative example is utilized to demonstrate the efficacy of the proposed approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Neural networks and wavelet transform have been recently seen as attractive tools for developing eficient solutions for many real world problems in function approximation. Function approximation is a very important task in environments where computation has to be based on extracting information from data samples in real world processes. So, mathematical model is a very important tool to guarantee the development of the neural network area. In this article we will introduce one series of mathematical demonstrations that guarantee the wavelets properties for the PPS functions. As application, we will show the use of PPS-wavelets in pattern recognition problems of handwritten digit through function approximation techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A radial basis function network (RBFN) circuit for function approximation is presented. Simulation and experimental results show that the network has good approximation capabilities. The RBFN was a squared hyperbolic secant with three adjustable parameters amplitude, width and center. To test the network a sinusoidal and sine function,vas approximated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Function approximation is a very important task in environments where the computation has to be based on extracting information from data samples in real world processes. So, the development of new mathematical model is a very important activity to guarantee the evolution of the function approximation area. In this sense, we will present the Polynomials Powers of Sigmoid (PPS) as a linear neural network. In this paper, we will introduce one series of practical results for the Polynomials Powers of Sigmoid, where we will show some advantages of the use of the powers of sigmiod functions in relationship the traditional MLP-Backpropagation and Polynomials in functions approximation problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce a diffusion-based algorithm in which multiple agents cooperate to predict a common and global statevalue function by sharing local estimates and local gradient information among neighbors. Our algorithm is a fully distributed implementation of the gradient temporal difference with linear function approximation, to make it applicable to multiagent settings. Simulations illustrate the benefit of cooperation in learning, as made possible by the proposed algorithm.