967 resultados para Function approximation
Resumo:
Based on the conclusions drawn in the bijective transformation between possibility and probability, a method is proposed to estimate the fuzzy membership function for pattern recognition purposes. A rational function approximation to the probability density function is obtained from the histogram of a finite (and sometimes very small) number of samples. This function is normalized such that the highest ordinate is one. The parameters representing the rational function are used for classifying the pattern samples based on a max-min decision rule. The method is illustrated with examples.
Resumo:
We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and functi approximation ideas,and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function-approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.
Resumo:
This paper investigates the use of Genetic Programming (GP) to create an approximate model for the non-linear relationship between flexural stiffness, length, mass per unit length and rotation speed associated with rotating beams and their natural frequencies. GP, a relatively new form of artificial intelligence, is derived from the Darwinian concept of evolution and genetics and it creates computer programs to solve problems by manipulating their tree structures. GP predicts the size and structural complexity of the empirical model by minimizing the mean square error at the specified points of input-output relationship dataset. This dataset is generated using a finite element model. The validity of the GP-generated model is tested by comparing the natural frequencies at training and at additional input data points. It is found that by using a non-dimensional stiffness, it is possible to get simple and accurate function approximation for the natural frequency. This function approximation model is then used to study the relationships between natural frequency and various influencing parameters for uniform and tapered beams. The relations obtained with GP model agree well with FEM results and can be used for preliminary design and structural optimization studies.
Resumo:
In this paper, we present a growing and pruning radial basis function based no-reference (NR) image quality model for JPEG-coded images. The quality of the images are estimated without referring to their original images. The features for predicting the perceived image quality are extracted by considering key human visual sensitivity factors such as edge amplitude, edge length, background activity and background luminance. Image quality estimation involves computation of functional relationship between HVS features and subjective test scores. Here, the problem of quality estimation is transformed to a function approximation problem and solved using GAP-RBF network. GAP-RBF network uses sequential learning algorithm to approximate the functional relationship. The computational complexity and memory requirement are less in GAP-RBF algorithm compared to other batch learning algorithms. Also, the GAP-RBF algorithm finds a compact image quality model and does not require retraining when the new image samples are presented. Experimental results prove that the GAP-RBF image quality model does emulate the mean opinion score (MOS). The subjective test results of the proposed metric are compared with JPEG no-reference image quality index as well as full-reference structural similarity image quality index and it is observed to outperform both.
Resumo:
In this paper, we give a generalization of a result by Borkar and Meyn (2000) 1], on the stability and convergence of synchronous-update stochastic approximation algorithms, to the case of asynchronous stochastic approximations with delays. We then describe an interesting application of the result to asynchronous distributed temporal difference (TD) learning with function approximation and delays. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Accurate system planning and performance evaluation requires knowledge of the joint impact of scheduling, interference, and fading. However, current analyses either require costly numerical simulations or make simplifying assumptions that limit the applicability of the results. In this paper, we derive analytical expressions for the spectral efficiency of cellular systems that use either the channel-unaware but fair round robin scheduler or the greedy, channel-aware but unfair maximum signal to interference ratio scheduler. As is the case in real deployments, non-identical co-channel interference at each user, both Rayleigh fading and lognormal shadowing, and limited modulation constellation sizes are accounted for in the analysis. We show that using a simple moment generating function-based lognormal approximation technique and an accurate Gaussian-Q function approximation leads to results that match simulations well. These results are more accurate than erstwhile results that instead used the moment-matching Fenton-Wilkinson approximation method and bounds on the Q function. The spectral efficiency of cellular systems is strongly influenced by the channel scheduler and the small constellation size that is typically used in third generation cellular systems.
Resumo:
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic rein- forcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their com- patibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further re- duce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal differ- ence learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.
Resumo:
Designing and optimizing high performance microprocessors is an increasingly difficult task due to the size and complexity of the processor design space, high cost of detailed simulation and several constraints that a processor design must satisfy. In this paper, we propose the use of empirical non-linear modeling techniques to assist processor architects in making design decisions and resolving complex trade-offs. We propose a procedure for building accurate non-linear models that consists of the following steps: (i) selection of a small set of representative design points spread across processor design space using latin hypercube sampling, (ii) obtaining performance measures at the selected design points using detailed simulation, (iii) building non-linear models for performance using the function approximation capabilities of radial basis function networks, and (iv) validating the models using an independently and randomly generated set of design points. We evaluate our model building procedure by constructing non-linear performance models for programs from the SPEC CPU2000 benchmark suite with a microarchitectural design space that consists of 9 key parameters. Our results show that the models, built using a relatively small number of simulations, achieve high prediction accuracy (only 2.8% error in CPI estimates on average) across a large processor design space. Our models can potentially replace detailed simulation for common tasks such as the analysis of key microarchitectural trends or searches for optimal processor design points.
Resumo:
In this paper, a model for composite beam with embedded de-lamination is developed using the wavelet based spectral finite element (WSFE) method particularly for damage detection using wave propagation analysis. The simulated responses are used as surrogate experimental results for the inverse problem of detection of damage using wavelet filtering. The WSFE technique is very similar to the fast fourier transform (FFT) based spectral finite element (FSFE) except that it uses compactly supported Daubechies scaling function approximation in time. Unlike FSFE formulation with periodicity assumption, the wavelet-based method allows imposition of initial values and thus is free from wrap around problems. This helps in analysis of finite length undamped structures, where the FSFE method fails to simulate accurate response. First, numerical experiments are performed to study the effect of de-lamination on the wave propagation characteristics. The responses are simulated for different de-lamination configurations for both broad-band and narrow-band excitations. Next, simulated responses are used for damage detection using wavelet analysis.
Resumo:
We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.
Resumo:
Realistic and realtime computational simulation of soft biological organs (e.g., liver, kidney) is necessary when one tries to build a quality surgical simulator that can simulate surgical procedures involving these organs. Since the realistic simulation of these soft biological organs should account for both nonlinear material behavior and large deformation, achieving realistic simulations in realtime using continuum mechanics based numerical techniques necessitates the use of a supercomputer or a high end computer cluster which are costly. Hence there is a need to employ soft computing techniques like Support Vector Machines (SVMs) which can do function approximation, and hence could achieve physically realistic simulations in realtime by making use of just a desktop computer. Present work tries to simulate a pig liver in realtime. Liver is assumed to be homogeneous, isotropic, and hyperelastic. Hyperelastic material constants are taken from the literature. An SVM is employed to achieve realistic simulations in realtime, using just a desktop computer. The code for the SVM is obtained from [1]. The SVM is trained using the dataset generated by performing hyperelastic analyses on the liver geometry, using the commercial finite element software package ANSYS. The methodology followed in the present work closely follows the one followed in [2] except that [2] uses Artificial Neural Networks (ANNs) while the present work uses SVMs to achieve realistic simulations in realtime. Results indicate the speed and accuracy that is obtained by employing the SVM for the targeted realistic and realtime simulation of the liver.
Resumo:
We consider the problem of finding the best features for value function approximation in reinforcement learning and develop an online algorithm to optimize the mean square Bellman error objective. For any given feature value, our algorithm performs gradient search in the parameter space via a residual gradient scheme and, on a slower timescale, also performs gradient search in the Grassman manifold of features. We present a proof of convergence of our algorithm. We show empirical results using our algorithm as well as a similar algorithm that uses temporal difference learning in place of the residual gradient scheme for the faster timescale updates.
Resumo:
In this paper, we consider an intrusion detection application for Wireless Sensor Networks. We study the problem of scheduling the sleep times of the individual sensors, where the objective is to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous stateaction spaces, in a manner similar to Fuemmeler and Veeravalli (IEEE Trans Signal Process 56(5), 2091-2101, 2008). However, unlike their formulation, we consider infinite horizon discounted and average cost objectives as performance criteria. For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation. Feature-based representations and function approximation is necessary to handle the curse of dimensionality associated with the underlying POMDP. Our proposed algorithm incorporates a policy gradient update using a one-simulation simultaneous perturbation stochastic approximation estimate on the faster timescale, while the Q-value parameter (arising from a linear function approximation architecture for the Q-values) is updated in an on-policy temporal difference algorithm-like fashion on the slower timescale. The feature selection scheme employed in each of our algorithms manages the energy and tracking components in a manner that assists the search for the optimal sleep-scheduling policy. For the sake of comparison, in both discounted and average settings, we also develop a function approximation analogue of the Q-learning algorithm. This algorithm, unlike the two-timescale variant, does not possess theoretical convergence guarantees. Finally, we also adapt our algorithms to include a stochastic iterative estimation scheme for the intruder's mobility model and this is useful in settings where the latter is not known. Our simulation results on a synthetic 2-dimensional network setting suggest that our algorithms result in better tracking accuracy at the cost of only a few additional sensors, in comparison to a recent prior work.
Resumo:
The vaporization of condensed materials in contact with high-current discharge plasmas is considered. A kinetic numerical method named direct simulation Monte Carlo (DSMC) and analytical kinetic approaches based on the bimodal distribution function approximation are employed. The solution of the kinetic layer problem depends upon the velocity at the outer boundary of the kinetic layer which varies from very small, corresponding to the high-density plasma near the evaporated surface, up to the sound speed, corresponding to evaporation into vacuum. The heavy particles density and temperature at the kinetic and hydrodynamic layer interface were obtained by the analytical method while DSMC calculation makes it possible to obtain the evolution of the particle distribution function within the kinetic layer and the layer thickness.
Resumo:
The past decade has seen a rise of interest in Laplacian eigenmaps (LEMs) for nonlinear dimensionality reduction. LEMs have been used in spectral clustering, in semisupervised learning, and for providing efficient state representations for reinforcement learning. Here, we show that LEMs are closely related to slow feature analysis (SFA), a biologically inspired, unsupervised learning algorithm originally designed for learning invariant visual representations. We show that SFA can be interpreted as a function approximation of LEMs, where the topological neighborhoods required for LEMs are implicitly defined by the temporal structure of the data. Based on this relation, we propose a generalization of SFA to arbitrary neighborhood relations and demonstrate its applicability for spectral clustering. Finally, we review previous work with the goal of providing a unifying view on SFA and LEMs. © 2011 Massachusetts Institute of Technology.