971 resultados para Q-Learning


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state–action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose two variants of the Q-learning algorithm that (both) use two timescales. One of these updates Q-values of all feasible state-action pairs at each instant while the other updates Q-values of states with actions chosen according to the ‘current ’ randomized policy updates. A sketch of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms for routing on different network topologies are presented and performance comparisons with the regular Q-learning algorithm are shown.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we consider an intrusion detection application for Wireless Sensor Networks. We study the problem of scheduling the sleep times of the individual sensors, where the objective is to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous stateaction spaces, in a manner similar to Fuemmeler and Veeravalli (IEEE Trans Signal Process 56(5), 2091-2101, 2008). However, unlike their formulation, we consider infinite horizon discounted and average cost objectives as performance criteria. For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation. Feature-based representations and function approximation is necessary to handle the curse of dimensionality associated with the underlying POMDP. Our proposed algorithm incorporates a policy gradient update using a one-simulation simultaneous perturbation stochastic approximation estimate on the faster timescale, while the Q-value parameter (arising from a linear function approximation architecture for the Q-values) is updated in an on-policy temporal difference algorithm-like fashion on the slower timescale. The feature selection scheme employed in each of our algorithms manages the energy and tracking components in a manner that assists the search for the optimal sleep-scheduling policy. For the sake of comparison, in both discounted and average settings, we also develop a function approximation analogue of the Q-learning algorithm. This algorithm, unlike the two-timescale variant, does not possess theoretical convergence guarantees. Finally, we also adapt our algorithms to include a stochastic iterative estimation scheme for the intruder's mobility model and this is useful in settings where the latter is not known. Our simulation results on a synthetic 2-dimensional network setting suggest that our algorithms result in better tracking accuracy at the cost of only a few additional sensors, in comparison to a recent prior work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electricity markets are complex environments, involving a large number of different entities, playing in a dynamic scene to obtain the best advantages and profits. MASCEM is a multi-agent electricity market simulator to model market players and simulate their operation in the market. Market players are entities with specific characteristics and objectives, making their decisions and interacting with other players. MASCEM is integrated with ALBidS, a system that provides several dynamic strategies for agents’ behavior. This paper presents a method that aims at enhancing ALBidS competence in endowing market players with adequate strategic bidding capabilities, allowing them to obtain the higher possible gains out of the market. This method uses a reinforcement learning algorithm to learn from experience how to choose the best from a set of possible actions. These actions are defined accordingly to the most probable points of bidding success. With the purpose of accelerating the convergence process, a simulated annealing based algorithm is included.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Techniques of optimization known as metaheuristics have achieved success in the resolution of many problems classified as NP-Hard. These methods use non deterministic approaches that reach very good solutions which, however, don t guarantee the determination of the global optimum. Beyond the inherent difficulties related to the complexity that characterizes the optimization problems, the metaheuristics still face the dilemma of xploration/exploitation, which consists of choosing between a greedy search and a wider exploration of the solution space. A way to guide such algorithms during the searching of better solutions is supplying them with more knowledge of the problem through the use of a intelligent agent, able to recognize promising regions and also identify when they should diversify the direction of the search. This way, this work proposes the use of Reinforcement Learning technique - Q-learning Algorithm - as exploration/exploitation strategy for the metaheuristics GRASP (Greedy Randomized Adaptive Search Procedure) and Genetic Algorithm. The GRASP metaheuristic uses Q-learning instead of the traditional greedy-random algorithm in the construction phase. This replacement has the purpose of improving the quality of the initial solutions that are used in the local search phase of the GRASP, and also provides for the metaheuristic an adaptive memory mechanism that allows the reuse of good previous decisions and also avoids the repetition of bad decisions. In the Genetic Algorithm, the Q-learning algorithm was used to generate an initial population of high fitness, and after a determined number of generations, where the rate of diversity of the population is less than a certain limit L, it also was applied to supply one of the parents to be used in the genetic crossover operator. Another significant change in the hybrid genetic algorithm is the proposal of a mutually interactive cooperation process between the genetic operators and the Q-learning algorithm. In this interactive/cooperative process, the Q-learning algorithm receives an additional update in the matrix of Q-values based on the current best solution of the Genetic Algorithm. The computational experiments presented in this thesis compares the results obtained with the implementation of traditional versions of GRASP metaheuristic and Genetic Algorithm, with those obtained using the proposed hybrid methods. Both algorithms had been applied successfully to the symmetrical Traveling Salesman Problem, which was modeled as a Markov decision process

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we use reinforcement learning (RL) as a tool to study price dynamics in an electronic retail market consisting of two competing sellers, and price sensitive and lead time sensitive customers. Sellers, offering identical products, compete on price to satisfy stochastically arriving demands (customers), and follow standard inventory control and replenishment policies to manage their inventories. In such a generalized setting, RL techniques have not previously been applied. We consider two representative cases: 1) no information case, were none of the sellers has any information about customer queue levels, inventory levels, or prices at the competitors; and 2) partial information case, where every seller has information about the customer queue levels and inventory levels of the competitors. Sellers employ automated pricing agents, or pricebots, which use RL-based pricing algorithms to reset the prices at random intervals based on factors such as number of back orders, inventory levels, and replenishment lead times, with the objective of maximizing discounted cumulative profit. In the no information case, we show that a seller who uses Q-learning outperforms a seller who uses derivative following (DF). In the partial information case, we model the problem as a Markovian game and use actor-critic based RL to learn dynamic prices. We believe our approach to solving these problems is a new and promising way of setting dynamic prices in multiseller environments with stochastic demands, price sensitive customers, and inventory replenishments.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we investigate the use of reinforcement learning (RL) techniques to the problem of determining dynamic prices in an electronic retail market. As representative models, we consider a single seller market and a two seller market, and formulate the dynamic pricing problem in a setting that easily generalizes to markets with more than two sellers. We first formulate the single seller dynamic pricing problem in the RL framework and solve the problem using the Q-learning algorithm through simulation. Next we model the two seller dynamic pricing problem as a Markovian game and formulate the problem in the RL framework. We solve this problem using actor-critic algorithms through simulation. We believe our approach to solving these problems is a promising way of setting dynamic prices in multi-agent environments. We illustrate the methodology with two illustrative examples of typical retail markets.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents the design and implementation of a learning controller for the Automatic Generation Control (AGC) in power systems based on a reinforcement learning (RL) framework. In contrast to the recent RL scheme for AGC proposed by us, the present method permits handling of power system variables such as Area Control Error (ACE) and deviations from scheduled frequency and tie-line flows as continuous variables. (In the earlier scheme, these variables have to be quantized into finitely many levels). The optimal control law is arrived at in the RL framework by making use of Q-learning strategy. Since the state variables are continuous, we propose the use of Radial Basis Function (RBF) neural networks to compute the Q-values for a given input state. Since, in this application we cannot provide training data appropriate for the standard supervised learning framework, a reinforcement learning algorithm is employed to train the RBF network. We also employ a novel exploration strategy, based on a Learning Automata algorithm,for generating training samples during Q-learning. The proposed scheme, in addition to being simple to implement, inherits all the attractive features of an RL scheme such as model independent design, flexibility in control objective specification, robustness etc. Two implementations of the proposed approach are presented. Through simulation studies the attractiveness of this approach is demonstrated.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users amidst the rapid increase in the usage of vehicles. In this paper, we formulate the TSC problem as a discounted cost Markov decision process (MDP) and apply multi-agent reinforcement learning (MARL) algorithms to obtain dynamic TSC policies. We model each traffic signal junction as an independent agent. An agent decides the signal duration of its phases in a round-robin (RR) manner using multi-agent Q-learning with either is an element of-greedy or UCB 3] based exploration strategies. It updates its Q-factors based on the cost feedback signal received from its neighbouring agents. This feedback signal can be easily constructed and is shown to be effective in minimizing the average delay of the vehicles in the network. We show through simulations over VISSIM that our algorithms perform significantly better than both the standard fixed signal timing (FST) algorithm and the saturation balancing (SAT) algorithm 15] over two real road networks.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Electricity markets are complex environments with very particular characteristics. A critical issue regarding these specific characteristics concerns the constant changes they are subject to. This is a result of the electricity markets’ restructuring, which was performed so that the competitiveness could be increased, but it also had exponential implications in the increase of the complexity and unpredictability in those markets scope. The constant growth in markets unpredictability resulted in an amplified need for market intervenient entities in foreseeing market behaviour. The need for understanding the market mechanisms and how the involved players’ interaction affects the outcomes of the markets, contributed to the growth of usage of simulation tools. Multi-agent based software is particularly well fitted to analyze dynamic and adaptive systems with complex interactions among its constituents, such as electricity markets. This dissertation presents ALBidS – Adaptive Learning strategic Bidding System, a multiagent system created to provide decision support to market negotiating players. This system is integrated with the MASCEM electricity market simulator, so that its advantage in supporting a market player can be tested using cases based on real markets’ data. ALBidS considers several different methodologies based on very distinct approaches, to provide alternative suggestions of which are the best actions for the supported player to perform. The approach chosen as the players’ actual action is selected by the employment of reinforcement learning algorithms, which for each different situation, simulation circumstances and context, decides which proposed action is the one with higher possibility of achieving the most success. Some of the considered approaches are supported by a mechanism that creates profiles of competitor players. These profiles are built accordingly to their observed past actions and reactions when faced with specific situations, such as success and failure. The system’s context awareness and simulation circumstances analysis, both in terms of results performance and execution time adaptation, are complementary mechanisms, which endow ALBidS with further adaptation and learning capabilities.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

One major component of power system operation is generation scheduling. The objective of the work is to develop efficient control strategies to the power scheduling problems through Reinforcement Learning approaches. The three important active power scheduling problems are Unit Commitment, Economic Dispatch and Automatic Generation Control. Numerical solution methods proposed for solution of power scheduling are insufficient in handling large and complex systems. Soft Computing methods like Simulated Annealing, Evolutionary Programming etc., are efficient in handling complex cost functions, but find limitation in handling stochastic data existing in a practical system. Also the learning steps are to be repeated for each load demand which increases the computation time.Reinforcement Learning (RL) is a method of learning through interactions with environment. The main advantage of this approach is it does not require a precise mathematical formulation. It can learn either by interacting with the environment or interacting with a simulation model. Several optimization and control problems have been solved through Reinforcement Learning approach. The application of Reinforcement Learning in the field of Power system has been a few. The objective is to introduce and extend Reinforcement Learning approaches for the active power scheduling problems in an implementable manner. The main objectives can be enumerated as:(i) Evolve Reinforcement Learning based solutions to the Unit Commitment Problem.(ii) Find suitable solution strategies through Reinforcement Learning approach for Economic Dispatch. (iii) Extend the Reinforcement Learning solution to Automatic Generation Control with a different perspective. (iv) Check the suitability of the scheduling solutions to one of the existing power systems.First part of the thesis is concerned with the Reinforcement Learning approach to Unit Commitment problem. Unit Commitment Problem is formulated as a multi stage decision process. Q learning solution is developed to obtain the optimwn commitment schedule. Method of state aggregation is used to formulate an efficient solution considering the minimwn up time I down time constraints. The performance of the algorithms are evaluated for different systems and compared with other stochastic methods like Genetic Algorithm.Second stage of the work is concerned with solving Economic Dispatch problem. A simple and straight forward decision making strategy is first proposed in the Learning Automata algorithm. Then to solve the scheduling task of systems with large number of generating units, the problem is formulated as a multi stage decision making task. The solution obtained is extended in order to incorporate the transmission losses in the system. To make the Reinforcement Learning solution more efficient and to handle continuous state space, a fimction approximation strategy is proposed. The performance of the developed algorithms are tested for several standard test cases. Proposed method is compared with other recent methods like Partition Approach Algorithm, Simulated Annealing etc.As the final step of implementing the active power control loops in power system, Automatic Generation Control is also taken into consideration.Reinforcement Learning has already been applied to solve Automatic Generation Control loop. The RL solution is extended to take up the approach of common frequency for all the interconnected areas, more similar to practical systems. Performance of the RL controller is also compared with that of the conventional integral controller.In order to prove the suitability of the proposed methods to practical systems, second plant ofNeyveli Thennal Power Station (NTPS IT) is taken for case study. The perfonnance of the Reinforcement Learning solution is found to be better than the other existing methods, which provide the promising step towards RL based control schemes for practical power industry.Reinforcement Learning is applied to solve the scheduling problems in the power industry and found to give satisfactory perfonnance. Proposed solution provides a scope for getting more profit as the economic schedule is obtained instantaneously. Since Reinforcement Learning method can take the stochastic cost data obtained time to time from a plant, it gives an implementable method. As a further step, with suitable methods to interface with on line data, economic scheduling can be achieved instantaneously in a generation control center. Also power scheduling of systems with different sources such as hydro, thermal etc. can be looked into and Reinforcement Learning solutions can be achieved.