14 resultados para Learning behavior

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain


Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents a hybrid behavior-based scheme using reinforcement learning for high-level control of autonomous underwater vehicles (AUVs). Two main features of the presented approach are hybrid behavior coordination and semi on-line neural-Q_learning (SONQL). Hybrid behavior coordination takes advantages of robustness and modularity in the competitive approach as well as efficient trajectories in the cooperative approach. SONQL, a new continuous approach of the Q_learning algorithm with a multilayer neural network is used to learn behavior state/action mapping online. Experimental results show the feasibility of the presented approach for AUVs

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Peer-reviewed

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Peer-reviewed

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We analyze the classical Bertrand model when consumers exhibit some strategic behavior in deciding from which seller they will buy. We use two related but different tools. Both consider a probabilistic learning (or evolutionary) mechanism, and in the two of them consumers' behavior in uences the competition between the sellers. The results obtained show that, in general, developing some sort of loyalty is a good strategy for the buyers as it works in their best interest. First, we consider a learning procedure described by a deterministic dynamic system and, using strong simplifying assumptions, we can produce a description of the process behavior. Second, we use nite automata to represent the strategies played by the agents and an adaptive process based on genetic algorithms to simulate the stochastic process of learning. By doing so we can relax some of the strong assumptions used in the rst approach and still obtain the same basic results. It is suggested that the limitations of the rst approach (analytical) provide a good motivation for the second approach (Agent-Based). Indeed, although both approaches address the same problem, the use of Agent-Based computational techniques allows us to relax hypothesis and overcome the limitations of the analytical approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Initiatives to stimulate the development and propagation of open educational resources (OER) need a sufficiently large community that can be mobilized to participate in this endeavour. Failure to achieve this could lead to underuse of OER. In the context of the Wikiwijs initiative a large scale survey was undertaken amongst primary and secondary school teachers to explore possible determinants of the educational use of digital learning materials (DLMs). Basing on the Integrative Model of Behaviour Prediction it was conjectured that self-efficacy, attitude and perceived norm would take a central role in explaining the intention to use DLMs. Several other predictors were added to the model as well whose effects were hypothesized to be mediated by the three central variables.All conjectured relationships were found using path analysis on survey data from 1484 teachers. Intention to DLMs was most strongly determined by self-efficacy, followed by attitude. ICT proficiency was in its turn the strongest predictor of self-efficacy. Perceived norm played only a limited role in the intention to use DLMs. Concluding, it seems paramount for the success of projects such as Wikiwijs to train teachers in the use of digital learning materials and ICT (e.g. the digital blackboard) and to impact on their attitude.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a hybrid coordination method for behavior-based control architectures. The hybrid method takes advantages of the robustness and modularity in competitive approaches as well as optimized trajectories in cooperative ones. This paper shows the feasibility of applying this hybrid method with a 3D-navigation to an autonomous underwater vehicle (AUV). The behaviors are learnt online by means of reinforcement learning. A continuous Q-learning implemented with a feed-forward neural network is employed. Realistic simulations were carried out. The results obtained show the good performance of the hybrid method on behavior coordination as well as the convergence of the behaviors

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this paper is to propose a Neural-Q_learning approach designed for online learning of simple and reactive robot behaviors. In this approach, the Q_function is generalized by a multi-layer neural network allowing the use of continuous states and actions. The algorithm uses a database of the most recent learning samples to accelerate and guarantee the convergence. Each Neural-Q_learning function represents an independent, reactive and adaptive behavior which maps sensorial states to robot control actions. A group of these behaviors constitutes a reactive control scheme designed to fulfill simple missions. The paper centers on the description of the Neural-Q_learning based behaviors showing their performance with an underwater robot in a target following task. Real experiments demonstrate the convergence and stability of the learning system, pointing out its suitability for online robot learning. Advantages and limitations are discussed

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Reinforcement learning (RL) is a very suitable technique for robot learning, as it can learn in unknown environments and in real-time computation. The main difficulties in adapting classic RL algorithms to robotic systems are the generalization problem and the correct observation of the Markovian state. This paper attempts to solve the generalization problem by proposing the semi-online neural-Q_learning algorithm (SONQL). The algorithm uses the classic Q_learning technique with two modifications. First, a neural network (NN) approximates the Q_function allowing the use of continuous states and actions. Second, a database of the most representative learning samples accelerates and stabilizes the convergence. The term semi-online is referred to the fact that the algorithm uses the current but also past learning samples. However, the algorithm is able to learn in real-time while the robot is interacting with the environment. The paper shows simulated results with the "mountain-car" benchmark and, also, real results with an underwater robot in a target following behavior

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates the role of learning by private agents and the central bank (two-sided learning) in a New Keynesian framework in which both sides of the economy have asymmetric and imperfect knowledge about the true data generating process. We assume that all agents employ the data that they observe (which may be distinct for different sets of agents) to form beliefs about unknown aspects of the true model of the economy, use their beliefs to decide on actions, and revise these beliefs through a statistical learning algorithm as new information becomes available. We study the short-run dynamics of our model and derive its policy recommendations, particularly with respect to central bank communications. We demonstrate that two-sided learning can generate substantial increases in volatility and persistence, and alter the behavior of the variables in the model in a signifficant way. Our simulations do not converge to a symmetric rational expectations equilibrium and we highlight one source that invalidates the convergence results of Marcet and Sargent (1989). Finally, we identify a novel aspect of central bank communication in models of learning: communication can be harmful if the central bank's model is substantially mis-specified

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider an oligopolistic market game, in which the players are competing firm in the same market of a homogeneous consumption good. The consumer side is represented by a fixed demand function. The firms decide how much to produce of a perishable consumption good, and they decide upon a number of information signals to be sent into the population in order to attract customers. Due to the minimal information provided, the players do not have a well--specified model of their environment. Our main objective is to characterize the adaptive behavior of the players in such a situation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Confidence in decision making is an important dimension of managerialbehavior. However, what is the relation between confidence, on the onehand, and the fact of receiving or expecting to receive feedback ondecisions taken, on the other hand? To explore this and related issuesin the context of everyday decision making, use was made of the ESM(Experience Sampling Method) to sample decisions taken by undergraduatesand business executives. For several days, participants received 4 or 5SMS messages daily (on their mobile telephones) at random moments at whichpoint they completed brief questionnaires about their current decisionmaking activities. Issues considered here include differences between thetypes of decisions faced by the two groups, their structure, feedback(received and expected), and confidence in decisions taken as well as inthe validity of feedback. No relation was found between confidence indecisions and whether participants received or expected to receivefeedback on those decisions. In addition, although participants areclearly aware that feedback can provide both confirming and disconfirming evidence, their ability to specify appropriatefeedback is imperfect. Finally, difficulties experienced inusing the ESM are discussed as are possibilities for further researchusing this methodology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many experiments have shown that human subjects do not necessarily behave in line with game theoretic assumptions and solution concepts. The reasons for this non-conformity are multiple. In this paper we study the argument whether a deviation from game theory is because subjects are rational, but doubt that others are rational as well, compared to the argument that subjects, in general, are boundedly rational themselves. To distinguish these two hypotheses, we study behavior in repeated 2-person and many-person Beauty-Contest-Games which are strategically different from one another. We analyze four different treatments and observe that convergence toward equilibrium is driven by learning through the information about the other player s choice and adaptation rather than self-initiated rational reasoning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Feedback-related negativity (FRN) is an ERP component that distinguishes positive from negative feedback. FRN has been hypothesized to be the product of an error signal that may be used to adjust future behavior. In addition, associative learning models assume that the trial-to-trial learning of cueoutcome mappings involves the minimization of an error term. This study evaluated whether FRN is a possible electrophysiological correlate of this error term in a predictive learning task where human subjects were asked to learn different cueoutcome relationships. Specifically, we evaluated the sensitivity of the FRN to the course of learning when different stimuli interact or compete to become a predictor of certain outcomes. Importantly, some of these cues were blocked by more informative or predictive cues (i.e., the blocking effect). Interestingly, the present results show that both learning and blocking affect the amplitude of the FRN component. Furthermore, independent analyses of positive and negative feedback event-related signals showed that the learning effect was restricted to the ERP component elicited by positive feedback. The blocking test showed differences in the FRN magnitude between a predictive and a blocked cue. Overall, the present results show that ERPs that are related to feedback processing correspond to the main predictions of associative learning models. ■