9 resultados para Robust Learning Algorithm
em Universidade Federal do Rio Grande do Norte(UFRN)
Resumo:
Techniques of optimization known as metaheuristics have achieved success in the resolution of many problems classified as NP-Hard. These methods use non deterministic approaches that reach very good solutions which, however, don t guarantee the determination of the global optimum. Beyond the inherent difficulties related to the complexity that characterizes the optimization problems, the metaheuristics still face the dilemma of xploration/exploitation, which consists of choosing between a greedy search and a wider exploration of the solution space. A way to guide such algorithms during the searching of better solutions is supplying them with more knowledge of the problem through the use of a intelligent agent, able to recognize promising regions and also identify when they should diversify the direction of the search. This way, this work proposes the use of Reinforcement Learning technique - Q-learning Algorithm - as exploration/exploitation strategy for the metaheuristics GRASP (Greedy Randomized Adaptive Search Procedure) and Genetic Algorithm. The GRASP metaheuristic uses Q-learning instead of the traditional greedy-random algorithm in the construction phase. This replacement has the purpose of improving the quality of the initial solutions that are used in the local search phase of the GRASP, and also provides for the metaheuristic an adaptive memory mechanism that allows the reuse of good previous decisions and also avoids the repetition of bad decisions. In the Genetic Algorithm, the Q-learning algorithm was used to generate an initial population of high fitness, and after a determined number of generations, where the rate of diversity of the population is less than a certain limit L, it also was applied to supply one of the parents to be used in the genetic crossover operator. Another significant change in the hybrid genetic algorithm is the proposal of a mutually interactive cooperation process between the genetic operators and the Q-learning algorithm. In this interactive/cooperative process, the Q-learning algorithm receives an additional update in the matrix of Q-values based on the current best solution of the Genetic Algorithm. The computational experiments presented in this thesis compares the results obtained with the implementation of traditional versions of GRASP metaheuristic and Genetic Algorithm, with those obtained using the proposed hybrid methods. Both algorithms had been applied successfully to the symmetrical Traveling Salesman Problem, which was modeled as a Markov decision process
Resumo:
We propose a new paradigm for collective learning in multi-agent systems (MAS) as a solution to the problem in which several agents acting over the same environment must learn how to perform tasks, simultaneously, based on feedbacks given by each one of the other agents. We introduce the proposed paradigm in the form of a reinforcement learning algorithm, nominating it as reinforcement learning with influence values. While learning by rewards, each agent evaluates the relation between the current state and/or action executed at this state (actual believe) together with the reward obtained after all agents that are interacting perform their actions. The reward is a result of the interference of others. The agent considers the opinions of all its colleagues in order to attempt to change the values of its states and/or actions. The idea is that the system, as a whole, must reach an equilibrium, where all agents get satisfied with the obtained results. This means that the values of the state/actions pairs match the reward obtained by each agent. This dynamical way of setting the values for states and/or actions makes this new reinforcement learning paradigm the first to include, naturally, the fact that the presence of other agents in the environment turns it a dynamical model. As a direct result, we implicitly include the internal state, the actions and the rewards obtained by all the other agents in the internal state of each agent. This makes our proposal the first complete solution to the conceptual problem that rises when applying reinforcement learning in multi-agent systems, which is caused by the difference existent between the environment and agent models. With basis on the proposed model, we create the IVQ-learning algorithm that is exhaustive tested in repetitive games with two, three and four agents and in stochastic games that need cooperation and in games that need collaboration. This algorithm shows to be a good option for obtaining solutions that guarantee convergence to the Nash optimum equilibrium in cooperative problems. Experiments performed clear shows that the proposed paradigm is theoretical and experimentally superior to the traditional approaches. Yet, with the creation of this new paradigm the set of reinforcement learning applications in MAS grows up. That is, besides the possibility of applying the algorithm in traditional learning problems in MAS, as for example coordination of tasks in multi-robot systems, it is possible to apply reinforcement learning in problems that are essentially collaborative
Resumo:
The metaheuristics techiniques are known to solve optimization problems classified as NP-complete and are successful in obtaining good quality solutions. They use non-deterministic approaches to generate solutions that are close to the optimal, without the guarantee of finding the global optimum. Motivated by the difficulties in the resolution of these problems, this work proposes the development of parallel hybrid methods using the reinforcement learning, the metaheuristics GRASP and Genetic Algorithms. With the use of these techniques, we aim to contribute to improved efficiency in obtaining efficient solutions. In this case, instead of using the Q-learning algorithm by reinforcement learning, just as a technique for generating the initial solutions of metaheuristics, we use it in a cooperative and competitive approach with the Genetic Algorithm and GRASP, in an parallel implementation. In this context, was possible to verify that the implementations in this study showed satisfactory results, in both strategies, that is, in cooperation and competition between them and the cooperation and competition between groups. In some instances were found the global optimum, in others theses implementations reach close to it. In this sense was an analyze of the performance for this proposed approach was done and it shows a good performance on the requeriments that prove the efficiency and speedup (gain in speed with the parallel processing) of the implementations performed
Resumo:
The use of wireless sensor and actuator networks in industry has been increasing past few years, bringing multiple benefits compared to wired systems, like network flexibility and manageability. Such networks consists of a possibly large number of small and autonomous sensor and actuator devices with wireless communication capabilities. The data collected by sensors are sent directly or through intermediary nodes along the network to a base station called sink node. The data routing in this environment is an essential matter since it is strictly bounded to the energy efficiency, thus the network lifetime. This work investigates the application of a routing technique based on Reinforcement Learning s Q-Learning algorithm to a wireless sensor network by using an NS-2 simulated environment. Several metrics like energy consumption, data packet delivery rates and delays are used to validate de proposal comparing it with another solutions existing in the literature
Resumo:
Traditional applications of feature selection in areas such as data mining, machine learning and pattern recognition aim to improve the accuracy and to reduce the computational cost of the model. It is done through the removal of redundant, irrelevant or noisy data, finding a representative subset of data that reduces its dimensionality without loss of performance. With the development of research in ensemble of classifiers and the verification that this type of model has better performance than the individual models, if the base classifiers are diverse, comes a new field of application to the research of feature selection. In this new field, it is desired to find diverse subsets of features for the construction of base classifiers for the ensemble systems. This work proposes an approach that maximizes the diversity of the ensembles by selecting subsets of features using a model independent of the learning algorithm and with low computational cost. This is done using bio-inspired metaheuristics with evaluation filter-based criteria
Resumo:
In the world we are constantly performing everyday actions. Two of these actions are frequent and of great importance: classify (sort by classes) and take decision. When we encounter problems with a relatively high degree of complexity, we tend to seek other opinions, usually from people who have some knowledge or even to the extent possible, are experts in the problem domain in question in order to help us in the decision-making process. Both the classification process as the process of decision making, we are guided by consideration of the characteristics involved in the specific problem. The characterization of a set of objects is part of the decision making process in general. In Machine Learning this classification happens through a learning algorithm and the characterization is applied to databases. The classification algorithms can be employed individually or by machine committees. The choice of the best methods to be used in the construction of a committee is a very arduous task. In this work, it will be investigated meta-learning techniques in selecting the best configuration parameters of homogeneous committees for applications in various classification problems. These parameters are: the base classifier, the architecture and the size of this architecture. We investigated nine types of inductors candidates for based classifier, two methods of generation of architecture and nine medium-sized groups for architecture. Dimensionality reduction techniques have been applied to metabases looking for improvement. Five classifiers methods are investigated as meta-learners in the process of choosing the best parameters of a homogeneous committee.
Resumo:
Beamforming is a technique widely used in various fields. With the aid of an antenna array, the beamforming aims to minimize the contribution of unknown interferents directions, while capturing the desired signal in a given direction. In this thesis are proposed beamforming techniques using Reinforcement Learning (RL) through the Q-Learning algorithm in antennas array. One proposal is to use RL to find the optimal policy selection between the beamforming (BF) and power control (PC) in order to better leverage the individual characteristics of each of them for a certain amount of Signal to Interference plus noise Ration (SINR). Another proposal is to use RL to determine the optimal policy between blind beamforming algorithm of CMA (Constant Modulus Algorithm) and DD (Decision Direct) in multipath environments. Results from simulations showed that the RL technique could be effective in achieving na optimal of switching between different techniques.
Resumo:
In multi-robot systems, both control architecture and work strategy represent a challenge for researchers. It is important to have a robust architecture that can be easily adapted to requirement changes. It is also important that work strategy allows robots to complete tasks efficiently, considering that robots interact directly in environments with humans. In this context, this work explores two approaches for robot soccer team coordination for cooperative tasks development. Both approaches are based on a combination of imitation learning and reinforcement learning. Thus, in the first approach was developed a control architecture, a fuzzy inference engine for recognizing situations in robot soccer games, a software for narration of robot soccer games based on the inference engine and the implementation of learning by imitation from observation and analysis of others robotic teams. Moreover, state abstraction was efficiently implemented in reinforcement learning applied to the robot soccer standard problem. Finally, reinforcement learning was implemented in a form where actions are explored only in some states (for example, states where an specialist robot system used them) differently to the traditional form, where actions have to be tested in all states. In the second approach reinforcement learning was implemented with function approximation, for which an algorithm called RBF-Sarsa($lambda$) was created. In both approaches batch reinforcement learning algorithms were implemented and imitation learning was used as a seed for reinforcement learning. Moreover, learning from robotic teams controlled by humans was explored. The proposal in this work had revealed efficient in the robot soccer standard problem and, when implemented in other robotics systems, they will allow that these robotics systems can efficiently and effectively develop assigned tasks. These approaches will give high adaptation capabilities to requirements and environment changes.
Resumo:
The great interest in nonlinear system identification is mainly due to the fact that a large amount of real systems are complex and need to have their nonlinearities considered so that their models can be successfully used in applications of control, prediction, inference, among others. This work evaluates the application of Fuzzy Wavelet Neural Networks (FWNN) to identify nonlinear dynamical systems subjected to noise and outliers. Generally, these elements cause negative effects on the identification procedure, resulting in erroneous interpretations regarding the dynamical behavior of the system. The FWNN combines in a single structure the ability to deal with uncertainties of fuzzy logic, the multiresolution characteristics of wavelet theory and learning and generalization abilities of the artificial neural networks. Usually, the learning procedure of these neural networks is realized by a gradient based method, which uses the mean squared error as its cost function. This work proposes the replacement of this traditional function by an Information Theoretic Learning similarity measure, called correntropy. With the use of this similarity measure, higher order statistics can be considered during the FWNN training process. For this reason, this measure is more suitable for non-Gaussian error distributions and makes the training less sensitive to the presence of outliers. In order to evaluate this replacement, FWNN models are obtained in two identification case studies: a real nonlinear system, consisting of a multisection tank, and a simulated system based on a model of the human knee joint. The results demonstrate that the application of correntropy as the error backpropagation algorithm cost function makes the identification procedure using FWNN models more robust to outliers. However, this is only achieved if the gaussian kernel width of correntropy is properly adjusted.