20 resultados para Hierarchical stochastic learning

em Indian Institute of Science - Bangalore - Índia


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Learning automata arranged in a two-level hierarchy are considered. The automata operate in a stationary random environment and update their action probabilities according to the linear-reward- -penalty algorithm at each level. Unlike some hierarchical systems previously proposed, no information transfer exists from one level to another, and yet the hierarchy possesses good convergence properties. Using weak-convergence concepts it is shown that for large time and small values of parameters in the algorithm, the evolution of the optimal path probability can be represented by a diffusion whose parameters can be computed explicitly.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Systems of learning automata have been studied by various researchers to evolve useful strategies for decision making under uncertainity. Considered in this paper are a class of hierarchical systems of learning automata where the system gets responses from its environment at each level of the hierarchy. A classification of such sequential learning tasks based on the complexity of the learning problem is presented. It is shown that none of the existing algorithms can perform in the most general type of hierarchical problem. An algorithm for learning the globally optimal path in this general setting is presented, and its convergence is established. This algorithm needs information transfer from the lower levels to the higher levels. Using the methodology of estimator algorithms, this model can be generalized to accommodate other kinds of hierarchical learning tasks.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A learning automaton operating in a random environment updates its action probabilities on the basis of the reactions of the environment, so that asymptotically it chooses the optimal action. When the number of actions is large the automaton becomes slow because there are too many updatings to be made at each instant. A hierarchical system of such automata with assured c-optimality is suggested to overcome that problem.The learning algorithm for the hierarchical system turns out to be a simple modification of the absolutely expedient algorithm known in the literature. The parameters of the algorithm at each level in the hierarchy depend only on the parameters and the action probabilities of the previous level. It follows that to minimize the number of updatings per cycle each automaton in the hierarchy need have only two or three actions.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we use reinforcement learning (RL) as a tool to study price dynamics in an electronic retail market consisting of two competing sellers, and price sensitive and lead time sensitive customers. Sellers, offering identical products, compete on price to satisfy stochastically arriving demands (customers), and follow standard inventory control and replenishment policies to manage their inventories. In such a generalized setting, RL techniques have not previously been applied. We consider two representative cases: 1) no information case, were none of the sellers has any information about customer queue levels, inventory levels, or prices at the competitors; and 2) partial information case, where every seller has information about the customer queue levels and inventory levels of the competitors. Sellers employ automated pricing agents, or pricebots, which use RL-based pricing algorithms to reset the prices at random intervals based on factors such as number of back orders, inventory levels, and replenishment lead times, with the objective of maximizing discounted cumulative profit. In the no information case, we show that a seller who uses Q-learning outperforms a seller who uses derivative following (DF). In the partial information case, we model the problem as a Markovian game and use actor-critic based RL to learn dynamic prices. We believe our approach to solving these problems is a new and promising way of setting dynamic prices in multiseller environments with stochastic demands, price sensitive customers, and inventory replenishments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Relaxation labeling processes are a class of mechanisms that solve the problem of assigning labels to objects in a manner that is consistent with respect to some domain-specific constraints. We reformulate this using the model of a team of learning automata interacting with an environment or a high-level critic that gives noisy responses as to the consistency of a tentative labeling selected by the automata. This results in an iterative linear algorithm that is itself probabilistic. Using an explicit definition of consistency we give a complete analysis of this probabilistic relaxation process using weak convergence results for stochastic algorithms. Our model can accommodate a range of uncertainties in the compatibility functions. We prove a local convergence result and show that the point of convergence depends both on the initial labeling and the constraints. The algorithm is implementable in a highly parallel fashion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning automata are adaptive decision making devices that are found useful in a variety of machine learning and pattern recognition applications. Although most learning automata methods deal with the case of finitely many actions for the automaton, there are also models of continuous-action-set learning automata (CALA). A team of such CALA can be useful in stochastic optimization problems where one has access only to noise-corrupted values of the objective function. In this paper, we present a novel formulation for noise-tolerant learning of linear classifiers using a CALA team. We consider the general case of nonuniform noise, where the probability that the class label of an example is wrong may be a function of the feature vector of the example. The objective is to learn the underlying separating hyperplane given only such noisy examples. We present an algorithm employing a team of CALA and prove, under some conditions on the class conditional densities, that the algorithm achieves noise-tolerant learning as long as the probability of wrong label for any example is less than 0.5. We also present some empirical results to illustrate the effectiveness of the algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state–action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Two optimal non-linear reinforcement schemes—the Reward-Inaction and the Penalty-Inaction—for the two-state automaton functioning in a stationary random environment are considered. Very simple conditions of symmetry of the non-linear function figuring in the reinforcement scheme are shown to be necessary and sufficient for optimality. General expressions for the variance and rate of learning are derived. These schemes are compared with the already existing optimal linear schemes in the light of average variance and average rate of learning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper formulates the automatic generation control (AGC) problem as a stochastic multistage decision problem. A strategy for solving this new AGC problem formulation is presented by using a reinforcement learning (RL) approach This method of obtaining an AGC controller does not depend on any knowledge of the system model and more importantly it admits considerable flexibility in defining the control objective. Two specific RL based AGC algorithms are presented. The first algorithm uses the traditional control objective of limiting area control error (ACE) excursions, where as, in the second algorithm, the controller can restore the load-generation balance by only monitoring deviation in tie line flows and system frequency and it does not need to know or estimate the composite ACE signal as is done by all current approaches. The effectiveness and versatility of the approaches has been demonstrated using a two area AGC model. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we give a generalization of a result by Borkar and Meyn (2000) 1], on the stability and convergence of synchronous-update stochastic approximation algorithms, to the case of asynchronous stochastic approximations with delays. We then describe an interesting application of the result to asynchronous distributed temporal difference (TD) learning with function approximation and delays. (C) 2011 Elsevier B.V. All rights reserved.