784 resultados para Wireless Sensor and Actuator Networks. Simulation. Reinforcement Learning. Routing Techniques


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a field application of a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot in cable tracking task. The learning system is characterized by using a direct policy search method for learning the internal state/action mapping. Policy only algorithms may suffer from long convergence times when dealing with real robotics. In order to speed up the process, the learning phase has been carried out in a simulated environment and, in a second step, the policy has been transferred and tested successfully on a real robot. Future steps plan to continue the learning process on-line while on the real robot while performing the mentioned task. We demonstrate its feasibility with real experiments on the underwater robot ICTINEU AUV

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Autonomous underwater vehicles (AUV) represent a challenging control problem with complex, noisy, dynamics. Nowadays, not only the continuous scientific advances in underwater robotics but the increasing number of subsea missions and its complexity ask for an automatization of submarine processes. This paper proposes a high-level control system for solving the action selection problem of an autonomous robot. The system is characterized by the use of reinforcement learning direct policy search methods (RLDPS) for learning the internal state/action mapping of some behaviors. We demonstrate its feasibility with simulated experiments using the model of our underwater robot URIS in a target following task

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Darrerament, l'interès pel desenvolupament d'aplicacions amb robots submarins autònoms (AUV) ha crescut de forma considerable. Els AUVs són atractius gràcies al seu tamany i el fet que no necessiten un operador humà per pilotar-los. Tot i això, és impossible comparar, en termes d'eficiència i flexibilitat, l'habilitat d'un pilot humà amb les escasses capacitats operatives que ofereixen els AUVs actuals. L'utilització de AUVs per cobrir grans àrees implica resoldre problemes complexos, especialment si es desitja que el nostre robot reaccioni en temps real a canvis sobtats en les condicions de treball. Per aquestes raons, el desenvolupament de sistemes de control autònom amb l'objectiu de millorar aquestes capacitats ha esdevingut una prioritat. Aquesta tesi tracta sobre el problema de la presa de decisions utilizant AUVs. El treball presentat es centra en l'estudi, disseny i aplicació de comportaments per a AUVs utilitzant tècniques d'aprenentatge per reforç (RL). La contribució principal d'aquesta tesi consisteix en l'aplicació de diverses tècniques de RL per tal de millorar l'autonomia dels robots submarins, amb l'objectiu final de demostrar la viabilitat d'aquests algoritmes per aprendre tasques submarines autònomes en temps real. En RL, el robot intenta maximitzar un reforç escalar obtingut com a conseqüència de la seva interacció amb l'entorn. L'objectiu és trobar una política òptima que relaciona tots els estats possibles amb les accions a executar per a cada estat que maximitzen la suma de reforços totals. Així, aquesta tesi investiga principalment dues tipologies d'algoritmes basats en RL: mètodes basats en funcions de valor (VF) i mètodes basats en el gradient (PG). Els resultats experimentals finals mostren el robot submarí Ictineu en una tasca autònoma real de seguiment de cables submarins. Per portar-la a terme, s'ha dissenyat un algoritme anomenat mètode d'Actor i Crític (AC), fruit de la fusió de mètodes VF amb tècniques de PG.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This document represents a doctoral thesis held under the Brazilian School of Public and Business Administration of Getulio Vargas Foundation (EBAPE/FGV), developed through the elaboration of three articles. The research that resulted in the articles is within the scope of the project entitled “Windows of opportunities and knowledge networks: implications for catch-up in developing countries”, funded by Support Programme for Research and Academic Production of Faculty (ProPesquisa) of Brazilian School of Public and Business Administration (EBAPE) of Getulio Vargas Foundation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

On-line learning methods have been applied successfully in multi-agent systems to achieve coordination among agents. Learning in multi-agent systems implies in a non-stationary scenario perceived by the agents, since the behavior of other agents may change as they simultaneously learn how to improve their actions. Non-stationary scenarios can be modeled as Markov Games, which can be solved using the Minimax-Q algorithm a combination of Q-learning (a Reinforcement Learning (RL) algorithm which directly learns an optimal control policy) and the Minimax algorithm. However, finding optimal control policies using any RL algorithm (Q-learning and Minimax-Q included) can be very time consuming. Trying to improve the learning time of Q-learning, we considered the QS-algorithm. in which a single experience can update more than a single action value by using a spreading function. In this paper, we contribute a Minimax-QS algorithm which combines the Minimax-Q algorithm and the QS-algorithm. We conduct a series of empirical evaluation of the algorithm in a simplified simulator of the soccer domain. We show that even using a very simple domain-dependent spreading function, the performance of the learning algorithm can be improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a new non-destructive testing (NDT) for reinforced concrete structures, in order to identify the components of their reinforcement. A time varying electromagnetic field is generated close to the structure by electromagnetic devices specially designed for this purpose. The presence of ferromagnetic materials (the steel bars of the reinforcement) immersed in the concrete disturbs the magnetic field at the surface of the structure. These field alterations are detected by sensors coils placed on the concrete surface. Variations in position and cross section (the size) of steel bars immersed in concrete originate slightly different values for the induced voltages at the coils.. The values for the induced voltages were obtained in laboratory tests, and multi-layer perceptron artificial neural networks with Levemberg-Marquardt training algorithm were used to identify the location and size of the bar. Preliminary results can be considered very good.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Includes bibliography

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Faced with an imminent restructuring of the electric power system, over the past few years many countries have invested in a new paradigm known as Smart Grid. This paradigm targets optimization and automation of electric power network, using advanced information and communication technologies. Among the main communication protocols for Smart Grids we have the DNP3 protocol, which provides secure data transmission with moderate rates. The IEEE 802.15.4 is another communication protocol also widely used in Smart Grid, especially in the so-called Home Area Network (HAN). Thus, many applications of Smart Grid depends on the interaction of these two protocols. This paper proposes modeling, in the traditional network simulator NS-2, the integration of DNP3 protocol and the IEEE 802.15.4 wireless standard for low cost simulations of Smart Grid applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The grinding operation gives workpieces their final finish, minimizing surface roughness through the interaction between the abrasive grains of a tool (grinding wheel) and the workpiece. However, excessive grinding wheel wear due to friction renders the tool unsuitable for further use, thus requiring the dressing operation to remove and/or sharpen the cutting edges of the worn grains to render them reusable. The purpose of this study was to monitor the dressing operation using the acoustic emission (AE) signal and statistics derived from this signal, classifying the grinding wheel as sharp or dull by means of artificial neural networks. An aluminum oxide wheel installed on a surface grinding machine, a signal acquisition system, and a single-point dresser were used in the experiments. Tests were performed varying overlap ratios and dressing depths. The root mean square values and two additional statistics were calculated based on the raw AE data. A multilayer perceptron neural network was used with the Levenberg-Marquardt learning algorithm, whose inputs were the aforementioned statistics. The results indicate that this method was successful in classifying the conditions of the grinding wheel in the dressing process, identifying the tool as "sharp''(with cutting capacity) or "dull''(with loss of cutting capacity), thus reducing the time and cost of the operation and minimizing excessive removal of abrasive material from the grinding wheel.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper aims to provide an improved NSGA-II (Non-Dominated Sorting Genetic Algorithm-version II) which incorporates a parameter-free self-tuning approach by reinforcement learning technique, called Non-Dominated Sorting Genetic Algorithm Based on Reinforcement Learning (NSGA-RL). The proposed method is particularly compared with the classical NSGA-II when applied to a satellite coverage problem. Furthermore, not only the optimization results are compared with results obtained by other multiobjective optimization methods, but also guarantee the advantage of no time-spending and complex parameter tuning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Die vorliegende Arbeit beschäftigt sich mit der Entwicklung eines Funktionsapproximators und dessen Verwendung in Verfahren zum Lernen von diskreten und kontinuierlichen Aktionen: 1. Ein allgemeiner Funktionsapproximator – Locally Weighted Interpolating Growing Neural Gas (LWIGNG) – wird auf Basis eines Wachsenden Neuralen Gases (GNG) entwickelt. Die topologische Nachbarschaft in der Neuronenstruktur wird verwendet, um zwischen benachbarten Neuronen zu interpolieren und durch lokale Gewichtung die Approximation zu berechnen. Die Leistungsfähigkeit des Ansatzes, insbesondere in Hinsicht auf sich verändernde Zielfunktionen und sich verändernde Eingabeverteilungen, wird in verschiedenen Experimenten unter Beweis gestellt. 2. Zum Lernen diskreter Aktionen wird das LWIGNG-Verfahren mit Q-Learning zur Q-LWIGNG-Methode verbunden. Dafür muss der zugrunde liegende GNG-Algorithmus abgeändert werden, da die Eingabedaten beim Aktionenlernen eine bestimmte Reihenfolge haben. Q-LWIGNG erzielt sehr gute Ergebnisse beim Stabbalance- und beim Mountain-Car-Problem und gute Ergebnisse beim Acrobot-Problem. 3. Zum Lernen kontinuierlicher Aktionen wird ein REINFORCE-Algorithmus mit LWIGNG zur ReinforceGNG-Methode verbunden. Dabei wird eine Actor-Critic-Architektur eingesetzt, um aus zeitverzögerten Belohnungen zu lernen. LWIGNG approximiert sowohl die Zustands-Wertefunktion als auch die Politik, die in Form von situationsabhängigen Parametern einer Normalverteilung repräsentiert wird. ReinforceGNG wird erfolgreich zum Lernen von Bewegungen für einen simulierten 2-rädrigen Roboter eingesetzt, der einen rollenden Ball unter bestimmten Bedingungen abfangen soll.