996 resultados para Stochastic convergence


Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic rein- forcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their com- patibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further re- duce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal differ- ence learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A methodology termed the “filtered density function” (FDF) is developed and implemented for large eddy simulation (LES) of chemically reacting turbulent flows. In this methodology, the effects of the unresolved scalar fluctuations are taken into account by considering the probability density function (PDF) of subgrid scale (SGS) scalar quantities. A transport equation is derived for the FDF in which the effect of chemical reactions appears in a closed form. The influences of scalar mixing and convection within the subgrid are modeled. The FDF transport equation is solved numerically via a Lagrangian Monte Carlo scheme in which the solutions of the equivalent stochastic differential equations (SDEs) are obtained. These solutions preserve the Itô-Gikhman nature of the SDEs. The consistency of the FDF approach, the convergence of its Monte Carlo solution and the performance of the closures employed in the FDF transport equation are assessed by comparisons with results obtained by direct numerical simulation (DNS) and by conventional LES procedures in which the first two SGS scalar moments are obtained by a finite difference method (LES-FD). These comparative assessments are conducted by implementations of all three schemes (FDF, DNS and LES-FD) in a temporally developing mixing layer and a spatially developing planar jet under both non-reacting and reacting conditions. In non-reacting flows, the Monte Carlo solution of the FDF yields results similar to those via LES-FD. The advantage of the FDF is demonstrated by its use in reacting flows. In the absence of a closure for the SGS scalar fluctuations, the LES-FD results are significantly different from those based on DNS. The FDF results show a much closer agreement with filtered DNS results. © 1998 American Institute of Physics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A class of model reference adaptive control system which make use of an augmented error signal has been introduced by Monopoli. Convergence problems in this attractive class of systems have been investigated in this paper using concepts from hyperstability theory. It is shown that the condition on the linear part of the system has to be stronger than the one given earlier. A boundedness condition on the input to the linear part of the system has been taken into account in the analysis - this condition appears to have been missed in the previous applications of hyperstability theory. Sufficient conditions for the convergence of the adaptive gain to the desired value are also given.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We analyze the AlApana of a Carnatic music piece without the prior knowledge of the singer or the rAga. AlApana is ameans to communicate to the audience, the flavor or the bhAva of the rAga through the permitted notes and its phrases. The input to our analysis is a recording of the vocal AlApana along with the accompanying instrument. The AdhAra shadja(base note) of the singer for that AlApana is estimated through a stochastic model of note frequencies. Based on the shadja, we identify the notes (swaras) used in the AlApana using a semi-continuous GMM. Using the probabilities of each note interval, we recognize swaras of the AlApana. For sampurNa rAgas, we can identify the possible rAga, based on the swaras. We have been able to achieve correct shadja identification, which is crucial to all further steps, in 88.8% of 55 AlApanas. Among them (48 AlApanas of 7 rAgas), we get 91.5% correct swara identification and 62.13% correct R (rAga) accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Stochastic hybrid systems arise in numerous applications of systems with multiple models; e.g., air traffc management, flexible manufacturing systems, fault tolerant control systems etc. In a typical hybrid system, the state space is hybrid in the sense that some components take values in a Euclidean space, while some other components are discrete. In this paper we propose two stochastic hybrid models, both of which permit diffusion and hybrid jump. Such models are essential for studying air traffic management in a stochastic framework.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Image segmentation is formulated as a stochastic process whose invariant distribution is concentrated at points of the desired region. By choosing multiple seed points, different regions can be segmented. The algorithm is based on the theory of time-homogeneous Markov chains and has been largely motivated by the technique of simulated annealing. The method proposed here has been found to perform well on real-world clean as well as noisy images while being computationally far less expensive than stochastic optimisation techniques

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents the design and performance analysis of a detector based on suprathreshold stochastic resonance (SSR) for the detection of deterministic signals in heavy-tailed non-Gaussian noise. The detector consists of a matched filter preceded by an SSR system which acts as a preprocessor. The SSR system is composed of an array of 2-level quantizers with independent and identically distributed (i.i.d) noise added to the input of each quantizer. The standard deviation sigma of quantizer noise is chosen to maximize the detection probability for a given false alarm probability. In the case of a weak signal, the optimum sigma also minimizes the mean-square difference between the output of the quantizer array and the output of the nonlinear transformation of the locally optimum detector. The optimum sigma depends only on the probability density functions (pdfs) of input noise and quantizer noise for weak signals, and also on the signal amplitude and the false alarm probability for non-weak signals. Improvement in detector performance stems primarily from quantization and to a lesser extent from the optimization of quantizer noise. For most input noise pdfs, the performance of the SSR detector is very close to that of the optimum detector. (C) 2012 Elsevier B.V. All rights reserved.