153 resultados para Reinforcement-Learning
Resumo:
This paper proposes a hybrid coordination method for behavior-based control architectures. The hybrid method takes advantages of the robustness and modularity in competitive approaches as well as optimized trajectories in cooperative ones. This paper shows the feasibility of applying this hybrid method with a 3D-navigation to an autonomous underwater vehicle (AUV). The behaviors are learnt online by means of reinforcement learning. A continuous Q-learning implemented with a feed-forward neural network is employed. Realistic simulations were carried out. The results obtained show the good performance of the hybrid method on behavior coordination as well as the convergence of the behaviors
Resumo:
This paper presents a hybrid behavior-based scheme using reinforcement learning for high-level control of autonomous underwater vehicles (AUVs). Two main features of the presented approach are hybrid behavior coordination and semi on-line neural-Q_learning (SONQL). Hybrid behavior coordination takes advantages of robustness and modularity in the competitive approach as well as efficient trajectories in the cooperative approach. SONQL, a new continuous approach of the Q_learning algorithm with a multilayer neural network is used to learn behavior state/action mapping online. Experimental results show the feasibility of the presented approach for AUVs
Resumo:
This paper proposes a field application of a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot in cable tracking task. The learning system is characterized by using a direct policy search method for learning the internal state/action mapping. Policy only algorithms may suffer from long convergence times when dealing with real robotics. In order to speed up the process, the learning phase has been carried out in a simulated environment and, in a second step, the policy has been transferred and tested successfully on a real robot. Future steps plan to continue the learning process on-line while on the real robot while performing the mentioned task. We demonstrate its feasibility with real experiments on the underwater robot ICTINEU AUV
Resumo:
Autonomous underwater vehicles (AUV) represent a challenging control problem with complex, noisy, dynamics. Nowadays, not only the continuous scientific advances in underwater robotics but the increasing number of subsea missions and its complexity ask for an automatization of submarine processes. This paper proposes a high-level control system for solving the action selection problem of an autonomous robot. The system is characterized by the use of reinforcement learning direct policy search methods (RLDPS) for learning the internal state/action mapping of some behaviors. We demonstrate its feasibility with simulated experiments using the model of our underwater robot URIS in a target following task
Resumo:
This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task
Resumo:
A reinforcement learning (RL) method was used to train a virtual character to move participants to a specified location. The virtual environment depicted an alleyway displayed through a wide field-of-view head-tracked stereo head-mounted display. Based on proxemics theory, we predicted that when the character approached within a personal or intimate distance to the participants, they would be inclined to move backwards out of the way. We carried out a between-groups experiment with 30 female participants, with 10 assigned arbitrarily to each of the following three groups: In the Intimate condition the character could approach within 0.38m and in the Social condition no nearer than 1.2m. In the Random condition the actions of the virtual character were chosen randomly from among the same set as in the RL method, and the virtual character could approach within 0.38m. The experiment continued in each case until the participant either reached the target or 7 minutes had elapsed. The distributions of the times taken to reach the target showed significant differences between the three groups, with 9 out of 10 in the Intimate condition reaching the target significantly faster than the 6 out of 10 who reached the target in the Social condition. Only 1 out of 10 in the Random condition reached the target. The experiment is an example of applied presence theory: we rely on the many findings that people tend to respond realistically in immersive virtual environments, and use this to get people to achieve a task of which they had been unaware. This method opens up the door for many such applications where the virtual environment adapts to the responses of the human participants with the aim of achieving particular goals.
Resumo:
A reinforcement learning (RL) method was used to train a virtual character to move participants to a specified location. The virtual environment depicted an alleyway displayed through a wide field-of-view head-tracked stereo head-mounted display. Based on proxemics theory, we predicted that when the character approached within a personal or intimate distance to the participants, they would be inclined to move backwards out of the way. We carried out a between-groups experiment with 30 female participants, with 10 assigned arbitrarily to each of the following three groups: In the Intimate condition the character could approach within 0.38m and in the Social condition no nearer than 1.2m. In the Random condition the actions of the virtual character were chosen randomly from among the same set as in the RL method, and the virtual character could approach within 0.38m. The experiment continued in each case until the participant either reached the target or 7 minutes had elapsed. The distributions of the times taken to reach the target showed significant differences between the three groups, with 9 out of 10 in the Intimate condition reaching the target significantly faster than the 6 out of 10 who reached the target in the Social condition. Only 1 out of 10 in the Random condition reached the target. The experiment is an example of applied presence theory: we rely on the many findings that people tend to respond realistically in immersive virtual environments, and use this to get people to achieve a task of which they had been unaware. This method opens up the door for many such applications where the virtual environment adapts to the responses of the human participants with the aim of achieving particular goals.
Resumo:
Reinforcement learning (RL) is a very suitable technique for robot learning, as it can learn in unknown environments and in real-time computation. The main difficulties in adapting classic RL algorithms to robotic systems are the generalization problem and the correct observation of the Markovian state. This paper attempts to solve the generalization problem by proposing the semi-online neural-Q_learning algorithm (SONQL). The algorithm uses the classic Q_learning technique with two modifications. First, a neural network (NN) approximates the Q_function allowing the use of continuous states and actions. Second, a database of the most representative learning samples accelerates and stabilizes the convergence. The term semi-online is referred to the fact that the algorithm uses the current but also past learning samples. However, the algorithm is able to learn in real-time while the robot is interacting with the environment. The paper shows simulated results with the "mountain-car" benchmark and, also, real results with an underwater robot in a target following behavior
Resumo:
Utilizing the well-known Ultimatum Game, this note presents the following phenomenon. If we start with simple stimulus-response agents, learning through naive reinforcement, and then grant them some introspective capabilities, we get outcomes that are not closer but farther away from the fully introspective game-theoretic approach. The cause of this is the following: there is an asymmetry in the information that agents can deduce from their experience, and this leads to a bias in their learning process.
Resumo:
Utilizing the well-known Ultimatum Game, this note presents the following phenomenon. If we start with simple stimulus-response agents,learning through naive reinforcement, and then grant them some introspective capabilities, we get outcomes that are not closer but farther away from the fully introspective game-theoretic approach. The cause of this is the following: there is an asymmetry in the information that agents can deduce from their experience, and this leads to a bias in their learning process.
Resumo:
Agent-based computational economics is becoming widely used in practice. This paperexplores the consistency of some of its standard techniques. We focus in particular on prevailingwholesale electricity trading simulation methods. We include different supply and demandrepresentations and propose the Experience-Weighted Attractions method to include severalbehavioural algorithms. We compare the results across assumptions and to economic theorypredictions. The match is good under best-response and reinforcement learning but not underfictitious play. The simulations perform well under flat and upward-slopping supply bidding,and also for plausible demand elasticity assumptions. Learning is influenced by the number ofbids per plant and the initial conditions. The overall conclusion is that agent-based simulationassumptions are far from innocuous. We link their performance to underlying features, andidentify those that are better suited to model wholesale electricity markets.
Resumo:
An assortment of human behaviors is thought to be driven by rewards including reinforcement learning, novelty processing, learning, decision making, economic choice, incentive motivation, and addiction. In each case the ventral tegmental area/ventral striatum (nucleus accumbens) (VTAVS) system has been implicated as a key structure by functional imaging studies, mostly on the basis of standard, univariate analyses. Here we propose that standard functional magnetic resonance imaging analysis needs to be complemented by methods that take into account the differential connectivity of the VTAVS system in the different behavioral contexts in order to describe reward based processes more appropriately. We fi rst consider the wider network for reward processing as it emerged from animal experimentation. Subsequently, an example for a method to assess functional connectivity is given. Finally, we illustrate the usefulness of such analyses by examples regarding reward valuation, reward expectation and the role of reward in addiction.
Resumo:
The EVS4CSCL project starts in the context of a Computer Supported Collaborative Learning environment (CSCL). Previous UOC projects created a CSCL generic platform (CLPL) to facilitate the development of CSCL applications. A discussion forum (DF) was the first application developed over the framework. This discussion forum was different from other products on the marketplace because of its focus on the learning process. The DF carried out the specification and elaboration phases from the discussion learning process but there was a lack in the consensus phase. The consensus phase in a learning environment is not something to be achieved but tested. Common tests are done by Electronic Voting System (EVS) tools, but consensus test is not an assessment test. We are not evaluating our students by their answers but by their discussion activity. Our educational EVS would be used as a discussion catalyst proposing a discussion about the results after an initial query or it would be used after a discussion period in order to manifest how the discussion changed the students mind (consensus). It should be also used by the teacher as a quick way to know where the student needs some reinforcement. That is important in a distance-learning environment where there is no direct contact between the teacher and the student and it is difficult to detect the learning lacks. In an educational environment, assessment it is a must and the EVS will provide direct assessment by peer usefulness evaluation, teacher marks on every query created and indirect assessment from statistics regarding the user activity.
Resumo:
Durant els darrers anys, s’han publicat un gran nombre de materials multimèdia destinats a l’aprenentatge de llengües, la major part dels quals son CD-ROM dissenyats com a cursos per l’autoaprenentatge. Amb aquests materials, els alumnes poden treballar independentment sense l’assessorament d’un professor, i per aquest motiu s’ha afirmat que promouen i faciliten l’aprenentatge autònom. Aquesta relació, però, no es certa, com Phil Benson i Peter Voller 1997:10) han manifestat encertadament:(…) Such claims are often dubious, however, because of the limited range of options and roles offered to the learner. Nevertheless, technologies of education in the broadest sense can be considered to be either more or less supportive of autonomy. The question is what kind of criteria do we apply in evaluating them? En aquest article presentem una investigació conjunta on es defineixen els criteris que poden ser utilitzats per avaluar materials multimèdia en relació a la seva facilitat per permetre l’aprenentatge autònom. Aquests criteris son la base d’un qüestionari que s’ha emprat per avaluar una selecció de CD-ROM destinats a l’autoaprenentatge de llengües. La estructura d’aquest article és la següent: - Una introducció de l’estudi - Els criteris que s’han utilitzar per la creació del qüestionari - Els resultats generals de l’avaluació - Les conclusions que s’han extret i la seva importància pel disseny instructiu multimèdia
Resumo:
La recerca efectuada sobre les estratègies d’aprenentatge de llengües ha demostrat que els aprenents que utilitzen estratègies metacognitives (planificació, revisió i avaluació) desenvolupen estratègies cognitives més eficaces (Anderson, 2002). Aquest article descriu les activitats que 43 estudiants de llengua estrangera de la Universitat de Vic van emprendre de forma independent i dedueix les estratègies metacognitives que van utilitzar sense cap formació prèvia en estratègies. Els estudiants van completar un dossier on expressaven les necessitats d’aprenentatge, la planificació i supervisió de les activitats i finalment l’avaluació de l’aprenentatge que havien portat a terme de manera independent fora de les hores lectives. La primera fase de l’anàlisi de les dades revela que, tot i que els estudiants foren capaços d’expressar les necessitats d’aprenentatge en general, la formulació d’objectius i la supervisió de les activitats fou escassa. La discussió gira entorn de la formació dels estudiants de llengües estrangeres en estratègies metacognitives i la integració de l’aprenentatge autònom dins el currículum docent.