885 resultados para Q-learning algorithm


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The comfort level of the seat has a major effect on the usage of a vehicle; thus, car manufacturers have been working on elevating car seat comfort as much as possible. However, still, the testing and evaluation of comfort are done using exhaustive trial and error testing and evaluation of data. In this thesis, we resort to machine learning and Artificial Neural Networks (ANN) to develop a fully automated approach. Even though this approach has its advantages in minimizing time and using a large set of data, it takes away the degree of freedom of the engineer on making decisions. The focus of this study is on filling the gap in a two-step comfort level evaluation which used pressure mapping with body regions to evaluate the average pressure supported by specific body parts and the Self-Assessment Exam (SAE) questions on evaluation of the person’s interest. This study has created a machine learning algorithm that works on giving a degree of freedom to the engineer in making a decision when mapping pressure values with body regions using ANN. The mapping is done with 92% accuracy and with the help of a Graphical User Interface (GUI) that facilitates the process during the testing time of comfort level evaluation of the car seat, which decreases the duration of the test analysis from days to hours.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Grâce à leur flexibilité et à leur facilité d’installation, les réseaux maillés sans fil (WMNs) permettent un déploiement d’une infrastructure à faible coût. Ces réseaux étendent la couverture des réseaux filaires permettant, ainsi, une connexion n’importe quand et n’importe où. Toutefois, leur performance est dégradée par les interférences et la congestion. Ces derniers causent des pertes de paquets et une augmentation du délai de transmission d’une façon drastique. Dans cette thèse, nous nous intéressons au routage adaptatif et à la stabilité dans ce type de réseaux. Dans une première partie de la thèse, nous nous intéressons à la conception d’une métrique de routage et à la sélection des passerelles permettant d’améliorer la performance des WMNs. Dans ce contexte nous proposons un protocole de routage à la source basé sur une nouvelle métrique. Cette métrique permet non seulement de capturer certaines caractéristiques des liens tels que les interférences inter-flux et intra-flux, le taux de perte des paquets mais également la surcharge des passerelles. Les résultats numériques montrent que la performance de cette métrique est meilleure que celle des solutions proposées dans la littérature. Dans une deuxième partie de la thèse, nous nous intéressons à certaines zones critiques dans les WMNs. Ces zones se trouvent autour des passerelles qui connaissent une concentration plus élevé du trafic ; elles risquent de provoquer des interférences et des congestions. À cet égard, nous proposons un protocole de routage proactif et adaptatif basé sur l’apprentissage par renforcement et qui pénalise les liens de mauvaise qualité lorsqu’on s’approche des passerelles. Un chemin dont la qualité des liens autour d’une passerelle est meilleure sera plus favorisé que les autres chemins de moindre qualité. Nous utilisons l’algorithme de Q-learning pour mettre à jour dynamiquement les coûts des chemins, sélectionner les prochains nœuds pour faire suivre les paquets vers les passerelles choisies et explorer d’autres nœuds voisins. Les résultats numériques montrent que notre protocole distribué, présente de meilleurs résultats comparativement aux protocoles présentés dans la littérature. Dans une troisième partie de cette thèse, nous nous intéressons aux problèmes d’instabilité des réseaux maillés sans fil. En effet, l’instabilité se produit à cause des changements fréquents des routes qui sont causés par les variations instantanées des qualités des liens dues à la présence des interférences et de la congestion. Ainsi, après une analyse de l’instabilité, nous proposons d’utiliser le nombre de variations des chemins dans une table de routage comme indicateur de perturbation des réseaux et nous utilisons la fonction d’entropie, connue dans les mesures de l’incertitude et du désordre des systèmes, pour sélectionner les routes stables. Les résultats numériques montrent de meilleures performances de notre protocole en comparaison avec d’autres protocoles dans la littérature en termes de débit, délai, taux de perte des paquets et l’indice de Gini.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The metaheuristics techiniques are known to solve optimization problems classified as NP-complete and are successful in obtaining good quality solutions. They use non-deterministic approaches to generate solutions that are close to the optimal, without the guarantee of finding the global optimum. Motivated by the difficulties in the resolution of these problems, this work proposes the development of parallel hybrid methods using the reinforcement learning, the metaheuristics GRASP and Genetic Algorithms. With the use of these techniques, we aim to contribute to improved efficiency in obtaining efficient solutions. In this case, instead of using the Q-learning algorithm by reinforcement learning, just as a technique for generating the initial solutions of metaheuristics, we use it in a cooperative and competitive approach with the Genetic Algorithm and GRASP, in an parallel implementation. In this context, was possible to verify that the implementations in this study showed satisfactory results, in both strategies, that is, in cooperation and competition between them and the cooperation and competition between groups. In some instances were found the global optimum, in others theses implementations reach close to it. In this sense was an analyze of the performance for this proposed approach was done and it shows a good performance on the requeriments that prove the efficiency and speedup (gain in speed with the parallel processing) of the implementations performed

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of wireless sensor and actuator networks in industry has been increasing past few years, bringing multiple benefits compared to wired systems, like network flexibility and manageability. Such networks consists of a possibly large number of small and autonomous sensor and actuator devices with wireless communication capabilities. The data collected by sensors are sent directly or through intermediary nodes along the network to a base station called sink node. The data routing in this environment is an essential matter since it is strictly bounded to the energy efficiency, thus the network lifetime. This work investigates the application of a routing technique based on Reinforcement Learning s Q-Learning algorithm to a wireless sensor network by using an NS-2 simulated environment. Several metrics like energy consumption, data packet delivery rates and delays are used to validate de proposal comparing it with another solutions existing in the literature

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Beamforming is a technique widely used in various fields. With the aid of an antenna array, the beamforming aims to minimize the contribution of unknown interferents directions, while capturing the desired signal in a given direction. In this thesis are proposed beamforming techniques using Reinforcement Learning (RL) through the Q-Learning algorithm in antennas array. One proposal is to use RL to find the optimal policy selection between the beamforming (BF) and power control (PC) in order to better leverage the individual characteristics of each of them for a certain amount of Signal to Interference plus noise Ration (SINR). Another proposal is to use RL to determine the optimal policy between blind beamforming algorithm of CMA (Constant Modulus Algorithm) and DD (Decision Direct) in multipath environments. Results from simulations showed that the RL technique could be effective in achieving na optimal of switching between different techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One major component of power system operation is generation scheduling. The objective of the work is to develop efficient control strategies to the power scheduling problems through Reinforcement Learning approaches. The three important active power scheduling problems are Unit Commitment, Economic Dispatch and Automatic Generation Control. Numerical solution methods proposed for solution of power scheduling are insufficient in handling large and complex systems. Soft Computing methods like Simulated Annealing, Evolutionary Programming etc., are efficient in handling complex cost functions, but find limitation in handling stochastic data existing in a practical system. Also the learning steps are to be repeated for each load demand which increases the computation time.Reinforcement Learning (RL) is a method of learning through interactions with environment. The main advantage of this approach is it does not require a precise mathematical formulation. It can learn either by interacting with the environment or interacting with a simulation model. Several optimization and control problems have been solved through Reinforcement Learning approach. The application of Reinforcement Learning in the field of Power system has been a few. The objective is to introduce and extend Reinforcement Learning approaches for the active power scheduling problems in an implementable manner. The main objectives can be enumerated as:(i) Evolve Reinforcement Learning based solutions to the Unit Commitment Problem.(ii) Find suitable solution strategies through Reinforcement Learning approach for Economic Dispatch. (iii) Extend the Reinforcement Learning solution to Automatic Generation Control with a different perspective. (iv) Check the suitability of the scheduling solutions to one of the existing power systems.First part of the thesis is concerned with the Reinforcement Learning approach to Unit Commitment problem. Unit Commitment Problem is formulated as a multi stage decision process. Q learning solution is developed to obtain the optimwn commitment schedule. Method of state aggregation is used to formulate an efficient solution considering the minimwn up time I down time constraints. The performance of the algorithms are evaluated for different systems and compared with other stochastic methods like Genetic Algorithm.Second stage of the work is concerned with solving Economic Dispatch problem. A simple and straight forward decision making strategy is first proposed in the Learning Automata algorithm. Then to solve the scheduling task of systems with large number of generating units, the problem is formulated as a multi stage decision making task. The solution obtained is extended in order to incorporate the transmission losses in the system. To make the Reinforcement Learning solution more efficient and to handle continuous state space, a fimction approximation strategy is proposed. The performance of the developed algorithms are tested for several standard test cases. Proposed method is compared with other recent methods like Partition Approach Algorithm, Simulated Annealing etc.As the final step of implementing the active power control loops in power system, Automatic Generation Control is also taken into consideration.Reinforcement Learning has already been applied to solve Automatic Generation Control loop. The RL solution is extended to take up the approach of common frequency for all the interconnected areas, more similar to practical systems. Performance of the RL controller is also compared with that of the conventional integral controller.In order to prove the suitability of the proposed methods to practical systems, second plant ofNeyveli Thennal Power Station (NTPS IT) is taken for case study. The perfonnance of the Reinforcement Learning solution is found to be better than the other existing methods, which provide the promising step towards RL based control schemes for practical power industry.Reinforcement Learning is applied to solve the scheduling problems in the power industry and found to give satisfactory perfonnance. Proposed solution provides a scope for getting more profit as the economic schedule is obtained instantaneously. Since Reinforcement Learning method can take the stochastic cost data obtained time to time from a plant, it gives an implementable method. As a further step, with suitable methods to interface with on line data, economic scheduling can be achieved instantaneously in a generation control center. Also power scheduling of systems with different sources such as hydro, thermal etc. can be looked into and Reinforcement Learning solutions can be achieved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents Reinforcement Learning (RL) approaches to Economic Dispatch problem. In this paper, formulation of Economic Dispatch as a multi stage decision making problem is carried out, then two variants of RL algorithms are presented. A third algorithm which takes into consideration the transmission losses is also explained. Efficiency and flexibility of the proposed algorithms are demonstrated through different representative systems: a three generator system with given generation cost table, IEEE 30 bus system with quadratic cost functions, 10 generator system having piecewise quadratic cost functions and a 20 generator system considering transmission losses. A comparison of the computation times of different algorithms is also carried out.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a Reinforcement Learning (RL) approach to economic dispatch (ED) using Radial Basis Function neural network. We formulate the ED as an N stage decision making problem. We propose a novel architecture to store Qvalues and present a learning algorithm to learn the weights of the neural network. Even though many stochastic search techniques like simulated annealing, genetic algorithm and evolutionary programming have been applied to ED, they require searching for the optimal solution for each load demand. Also they find limitation in handling stochastic cost functions. In our approach once we learn the Q-values, we can find the dispatch for any load demand. We have recently proposed a RL approach to ED. In that approach, we could find only the optimum dispatch for a set of specified discrete values of power demand. The performance of the proposed algorithm is validated by taking IEEE 6 bus system, considering transmission losses

Relevância:

100.00% 100.00%

Publicador:

Resumo:

On-line learning methods have been applied successfully in multi-agent systems to achieve coordination among agents. Learning in multi-agent systems implies in a non-stationary scenario perceived by the agents, since the behavior of other agents may change as they simultaneously learn how to improve their actions. Non-stationary scenarios can be modeled as Markov Games, which can be solved using the Minimax-Q algorithm a combination of Q-learning (a Reinforcement Learning (RL) algorithm which directly learns an optimal control policy) and the Minimax algorithm. However, finding optimal control policies using any RL algorithm (Q-learning and Minimax-Q included) can be very time consuming. Trying to improve the learning time of Q-learning, we considered the QS-algorithm. in which a single experience can update more than a single action value by using a spreading function. In this paper, we contribute a Minimax-QS algorithm which combines the Minimax-Q algorithm and the QS-algorithm. We conduct a series of empirical evaluation of the algorithm in a simplified simulator of the soccer domain. We show that even using a very simple domain-dependent spreading function, the performance of the learning algorithm can be improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Active machine learning algorithms are used when large numbers of unlabeled examples are available and getting labels for them is costly (e.g. requiring consulting a human expert). Many conventional active learning algorithms focus on refining the decision boundary, at the expense of exploring new regions that the current hypothesis misclassifies. We propose a new active learning algorithm that balances such exploration with refining of the decision boundary by dynamically adjusting the probability to explore at each step. Our experimental results demonstrate improved performance on data sets that require extensive exploration while remaining competitive on data sets that do not. Our algorithm also shows significant tolerance of noise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Die vorliegende Arbeit beschäftigt sich mit der Entwicklung eines Funktionsapproximators und dessen Verwendung in Verfahren zum Lernen von diskreten und kontinuierlichen Aktionen: 1. Ein allgemeiner Funktionsapproximator – Locally Weighted Interpolating Growing Neural Gas (LWIGNG) – wird auf Basis eines Wachsenden Neuralen Gases (GNG) entwickelt. Die topologische Nachbarschaft in der Neuronenstruktur wird verwendet, um zwischen benachbarten Neuronen zu interpolieren und durch lokale Gewichtung die Approximation zu berechnen. Die Leistungsfähigkeit des Ansatzes, insbesondere in Hinsicht auf sich verändernde Zielfunktionen und sich verändernde Eingabeverteilungen, wird in verschiedenen Experimenten unter Beweis gestellt. 2. Zum Lernen diskreter Aktionen wird das LWIGNG-Verfahren mit Q-Learning zur Q-LWIGNG-Methode verbunden. Dafür muss der zugrunde liegende GNG-Algorithmus abgeändert werden, da die Eingabedaten beim Aktionenlernen eine bestimmte Reihenfolge haben. Q-LWIGNG erzielt sehr gute Ergebnisse beim Stabbalance- und beim Mountain-Car-Problem und gute Ergebnisse beim Acrobot-Problem. 3. Zum Lernen kontinuierlicher Aktionen wird ein REINFORCE-Algorithmus mit LWIGNG zur ReinforceGNG-Methode verbunden. Dabei wird eine Actor-Critic-Architektur eingesetzt, um aus zeitverzögerten Belohnungen zu lernen. LWIGNG approximiert sowohl die Zustands-Wertefunktion als auch die Politik, die in Form von situationsabhängigen Parametern einer Normalverteilung repräsentiert wird. ReinforceGNG wird erfolgreich zum Lernen von Bewegungen für einen simulierten 2-rädrigen Roboter eingesetzt, der einen rollenden Ball unter bestimmten Bedingungen abfangen soll.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this letter, we propose a class of self-stabilizing learning algorithms for minor component analysis (MCA), which includes a few well-known MCA learning algorithms. Self-stabilizing means that the sign of the weight vector length change is independent of the presented input vector. For these algorithms, rigorous global convergence proof is given and the convergence rate is also discussed. By combining the positive properties of these algorithms, a new learning algorithm is proposed which can improve the performance. Simulations are employed to confirm our theoretical results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

* This research was partially supported by the Latvian Science Foundation under grant No.02-86d.