38 resultados para Pott, Mal de.
Resumo:
Learning automata arranged in a two-level hierarchy are considered. The automata operate in a stationary random environment and update their action probabilities according to the linear-reward- -penalty algorithm at each level. Unlike some hierarchical systems previously proposed, no information transfer exists from one level to another, and yet the hierarchy possesses good convergence properties. Using weak-convergence concepts it is shown that for large time and small values of parameters in the algorithm, the evolution of the optimal path probability can be represented by a diffusion whose parameters can be computed explicitly.
Resumo:
The paper deals with the basic problem of adjusting a matrix gain in a discrete-time linear multivariable system. The object is to obtain a global convergence criterion, i.e. conditions under which a specified error signal asymptotically approaches zero and other signals in the system remain bounded for arbitrary initial conditions and for any bounded input to the system. It is shown that for a class of up-dating algorithms for the adjustable gain matrix, global convergence is crucially dependent on a transfer matrix G(z) which has a simple block diagram interpretation. When w(z)G(z) is strictly discrete positive real for a scalar w(z) such that w-1(z) is strictly proper with poles and zeros within the unit circle, an augmented error scheme is suggested and is proved to result in global convergence. The solution avoids feeding back a quadratic term as recommended in other schemes for single-input single-output systems.
Resumo:
Systems of learning automata have been studied by various researchers to evolve useful strategies for decision making under uncertainity. Considered in this paper are a class of hierarchical systems of learning automata where the system gets responses from its environment at each level of the hierarchy. A classification of such sequential learning tasks based on the complexity of the learning problem is presented. It is shown that none of the existing algorithms can perform in the most general type of hierarchical problem. An algorithm for learning the globally optimal path in this general setting is presented, and its convergence is established. This algorithm needs information transfer from the lower levels to the higher levels. Using the methodology of estimator algorithms, this model can be generalized to accommodate other kinds of hierarchical learning tasks.
Resumo:
A learning automaton operating in a random environment updates its action probabilities on the basis of the reactions of the environment, so that asymptotically it chooses the optimal action. When the number of actions is large the automaton becomes slow because there are too many updatings to be made at each instant. A hierarchical system of such automata with assured c-optimality is suggested to overcome that problem.The learning algorithm for the hierarchical system turns out to be a simple modification of the absolutely expedient algorithm known in the literature. The parameters of the algorithm at each level in the hierarchy depend only on the parameters and the action probabilities of the previous level. It follows that to minimize the number of updatings per cycle each automaton in the hierarchy need have only two or three actions.
Resumo:
The problem of learning correct decision rules to minimize the probability of misclassification is a long-standing problem of supervised learning in pattern recognition. The problem of learning such optimal discriminant functions is considered for the class of problems where the statistical properties of the pattern classes are completely unknown. The problem is posed as a game with common payoff played by a team of mutually cooperating learning automata. This essentially results in a probabilistic search through the space of classifiers. The approach is inherently capable of learning discriminant functions that are nonlinear in their parameters also. A learning algorithm is presented for the team and convergence is established. It is proved that the team can obtain the optimal classifier to an arbitrary approximation. Simulation results with a few examples are presented where the team learns the optimal classifier.
Resumo:
A cooperative game played in a sequential manner by a pair of learning automata is investigated in this paper. The automata operate in an unknown random environment which gives a common pay-off to the automata. Necessary and sufficient conditions on the functions in the reinforcement scheme are given for absolute monotonicity which enables the expected pay-off to be monotonically increasing in any arbitrary environment. As each participating automaton operates with no information regarding the other partner, the results of the paper are relevant to decentralized control.
Resumo:
Multiaction learning automata which update their action probabilities on the basis of the responses they get from an environment are considered in this paper. The automata update the probabilities according to whether the environment responds with a reward or a penalty. Learning automata are said to possess ergodicity of the mean if the mean action probability is the state probability (or unconditional probability) of an ergodic Markov chain. In an earlier paper [11] we considered the problem of a two-action learning automaton being ergodic in the mean (EM). The family of such automata was characterized completely by proving the necessary and sufficient conditions for automata to be EM. In this paper, we generalize the results of [11] and obtain necessary and sufficient conditions for the multiaction learning automaton to be EM. These conditions involve two families of probability updating functions. It is shown that for the automaton to be EM the two families must be linearly dependent. The vector defining the linear dependence is the only vector parameter which controls the rate of convergence of the automaton. Further, the technique for reducing the variance of the limiting distribution is discussed. Just as in the two-action case, it is shown that the set of absolutely expedient schemes and the set of schemes which possess ergodicity of the mean are mutually disjoint.
Resumo:
This paper considers the on-line identification of a non-linear system in terms of a Hammerstein model, with a zero-memory non-linear gain followed by a linear system. The linear part is represented by a Laguerre expansion of its impulse response and the non-linear part by a polynomial. The identification procedure involves determination of the coefficients of the Laguerre expansion of correlation functions and an iterative adjustment of the parameters of the non-linear gain by gradient methods. The method is applicable to situations involving a wide class of input signals. Even in the presence of additive correlated noise, satisfactory performance is achieved with the variance of the error converging to a value close to the variance of the noise. Digital computer simulation establishes the practicability of the scheme in different situations.
Resumo:
This paper is concerned with the analysis of the absolute stability of a non-linear autonomous system which consists of a single non-linearity belonging to a particular class, in an otherwise linear feedback loop. It is motivated from the earlier Popovlike frequency-domain criteria using the ' multiplier ' eoncept and involves the construction of ' stability multipliers' with prescribed phase characteristics. A few computer-based methods by which this problem can be solved are indicated and it is shown that this constitutes a stop-by-step procedure for testing the stability properties of a given system.
Resumo:
The positivity of operators in Hilbert spaces is an important concept finding wide application in various branches of Mathematical System Theory. A frequency- domain condition that ensures the positivity of time-varying operators in L2 with a state-space description, is derived in this paper by using certain newly developed inequalities concerning the input-state relation of such operators. As an interesting application of these results, an L2 stability criterion for time-varying feedback systems consisting of a finite-sector non-linearity is also developed.
Resumo:
It is shown that a sufficient condition for the asymptotic stability-in-the-large of an autonomous system containing a linear part with transfer function G(jω) and a non-linearity belonging to a class of power-law non-linearities with slope restriction [0, K] in cascade in a negative feedback loop is ReZ(jω)[G(jω) + 1 K] ≥ 0 for all ω where the multiplier is given by, Z(jω) = 1 + αjω + Y(jω) - Y(-jω) with a real, y(t) = 0 for t < 0 and ∫ 0 ∞ |y(t)|dt < 1 2c2, c2 being a constant associated with the class of non-linearity. Any allowable multiplier can be converted to the above form and this form leads to lesser restrictions on the parameters in many cases. Criteria for the case of odd monotonic non-linearities and of linear gains are obtained as limiting cases of the criterion developed. A striking feature of the present result is that in the linear case it reduces to the necessary and sufficient conditions corresponding to the Nyquist criterion. An inequality of the type |R(T) - R(- T)| ≤ 2c2R(0) where R(T) is the input-output cross-correlation function of the non-linearity, is used in deriving the results.
Resumo:
Two optimal non-linear reinforcement schemes—the Reward-Inaction and the Penalty-Inaction—for the two-state automaton functioning in a stationary random environment are considered. Very simple conditions of symmetry of the non-linear function figuring in the reinforcement scheme are shown to be necessary and sufficient for optimality. General expressions for the variance and rate of learning are derived. These schemes are compared with the already existing optimal linear schemes in the light of average variance and average rate of learning.
Resumo:
Concerning the L2-stability of feedback systems containing a linear time-varying operator, some of the stringent restrictions imposed on the multiplier as well as the linear part of the system, in the criteria presented earlier, are relaxed.
Resumo:
Sufficient conditions are given for the L2-stability of a class of feedback systems consisting of a linear operator G and a nonlinear gain function, either odd monotone or restricted by a power-law, in cascade, in a negative feedback loop. The criterion takes the form of a frequency-domain inequality, Re[1 + Z(jω)] G(jω) δ > 0 ω ε (−∞, +∞), where Z(jω) is given by, Z(jω) = β[Y1(jω) + Y2(jω)] + (1 − β)[Y3(jω) − Y3(−jω)], with 0 β 1 and the functions y1(·), y2(·) and y3(·) satisfying the time-domain inequalities, ∝−∞+∞¦y1(t) + y2(t)¦ dt 1 − ε, y1(·) = 0, t < 0, y2(·) = 0, t > 0 and ε > 0, and , c2 being a constant depending on the order of the power-law restricting the nonlinear function. The criterion is derived using Zames' passive operator theory and is shown to be more general than the existing criteria