1000 resultados para Reinforcement Value


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The reinforcement omission effect (ROE) has been attributed to both motivational and attentional consequences of surprising reinforcement omission. Recent evidence suggests that the basolateral complex of the amygdala is involved in motivational components related to reinforcement value, whereas the central nucleus of the amygdala is involved in the processing of the attentional consequences of surprise. This study was designed to verify whether the mechanisms involved in the ROE depend on the integrity of either the basolateral amygdala complex or central nucleus of the amygdala. The ROE was evaluated in rats with lesions of either the central nucleus or basolateral complex of the amygdala and trained on a fixed-interval schedule procedure (Experiment 1) and fixed-interval with limited hold signaled schedule procedure (Experiment 2). The results of Experiment 1 showed that sham-operated rats and rats with lesions of either the central nucleus or basolateral area displayed the ROE. In contrast, in Experiment 2, subjects with lesions of the central nucleus or basolateral complex of the amygdala exhibited a smaller ROE compared with sham-operated subjects. Thus, the effects of selective lesions of amygdala subregions on the ROE in rats depended on the training procedure. Furthermore, the absence of differences between the lesioned groups in either experiment did not allow the dissociation of attentional or motivational components of the ROE with functions of specific areas of the amygdala. Thus, results did not show a functional double-dissociation between the central nucleus and basolateral area in the ROE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The associations of physical activity and sedentary behavior with barriers, enjoyment, and preferences were examined in a population-based mail survey of 1,332 adults. Respondents reporting high enjoyment and preference for physical activity were more likely to report high levels of activity. Those reporting cost, the weather, and personal barriers to physical activity were less likely to be physically active. Preference for sedentary behavior was associated with the decreased likelihood of being physically active, and the weather as a barrier to physical activity was associated with the increased likelihood of sedentary behavior. These constructs can be used to examine individual and environmental influences on physical activity and sedentary behavior in specific populations and could inform the development of targeted interventions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Reinforcement Learning is an area of Machine Learning that deals with how an agent should take actions in an environment such as to maximize the notion of accumulated reward. This type of learning is inspired by the way humans learn and has led to the creation of various algorithms for reinforcement learning. These algorithms focus on the way in which an agent’s behaviour can be improved, assuming independence as to their surroundings. The current work studies the application of reinforcement learning methods to solve the inverted pendulum problem. The importance of the variability of the environment (factors that are external to the agent) on the execution of reinforcement learning agents is studied by using a model that seeks to obtain equilibrium (stability) through dynamism – a Cart-Pole system or inverted pendulum. We sought to improve the behaviour of the autonomous agents by changing the information passed to them, while maintaining the agent’s internal parameters constant (learning rate, discount factors, decay rate, etc.), instead of the classical approach of tuning the agent’s internal parameters. The influence of changes on the state set and the action set on an agent’s capability to solve the Cart-pole problem was studied. We have studied typical behaviour of reinforcement learning agents applied to the classic BOXES model and a new form of characterizing the environment was proposed using the notion of convergence towards a reference value. We demonstrate the gain in performance of this new method applied to a Q-Learning agent.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning what to approach, and what to avoid, involves assigning value to environmental cues that predict positive and negative events. Studies in animals indicate that the lateral habenula encodes the previously learned negative motivational value of stimuli. However, involvement of the habenula in dynamic trial-by-trial aversive learning has not been assessed, and the functional role of this structure in humans remains poorly characterized, in part, due to its small size. Using high-resolution functional neuroimaging and computational modeling of reinforcement learning, we demonstrate positive habenula responses to the dynamically changing values of cues signaling painful electric shocks, which predict behavioral suppression of responses to those cues across individuals. By contrast, negative habenula responses to monetary reward cue values predict behavioral invigoration. Our findings show that the habenula plays a key role in an online aversive learning system and in generating associated motivated behavior in humans.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This is the second part of the final report submitted to the Iowa Department of Transportation. Part 1 contained a comparison of unaged fiber composite and steel dowels and derivation of the appropriate theoretical model for analyzing the results. Part 2 of this final report covers the theoretical and experimental models for accelerated aging of fiber composite reinforcing bars and dowels cast in a concrete environment. Part 2 contains results from testing of unaged and aged fiber composite dowels and steel dowels, in addition to unaged and aged fiber composite reinforcing bars. Additional tests have been performed on unaged dowels (both steel and fibercomposite) to verify results from Part 1 and to keep the testing program consistent. Slight modifications have been made to the dowel specimens presented in Part 1. These modifications are noted in the Section 3.4 of this report. The flexural modulus of elasticity for the FC dowel bar given in Part 1 of the final report (Table 3. 2) was for the incorrect structural shape (non-circular cross section). The value is corrected and given in Part 2 of the final report (Table 3.4 for the.modulus of elasticity supplied by the manufacturer, and Tables 3. 5 and 3. 6 for experimentally determined modulus of elasticities) • The value in Part 1 was not used for any analysis of the FC dowel bars.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Orienting attention in space recruits fronto-parietal networks whose damage results in unilateral spatial neglect. However, attention orienting may also be governed by emotional and motivational factors; but it remains unknown whether these factors act through a modulation of the fronto-parietal attentional systems or distinct neural pathways. Here we asked whether attentional orienting is affected by learning about the reward value of targets in a visual search task, in a spatially specific manner, and whether these effects are preserved in right-brain damaged patients with left spatial neglect. We found that associating rewards with left-sided (but not right-sided) targets during search led to progressive exploration biases towards left space, in both healthy people and neglect patients. Such spatially specific biases occurred even without any conscious awareness of the asymmetric reward contingencies. These results show that reward-induced modulations of space representation are preserved despite a dysfunction of fronto-parietal networks associated with neglect, and therefore suggest that they may arise through spared subcortical networks directly acting on sensory processing and/or oculomotor circuits. These effects could be usefully exploited for potentiating rehabilitation strategies in neglect patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we employ techniques from artificial intelligence such as reinforcement learning and agent based modeling as building blocks of a computational model for an economy based on conventions. First we model the interaction among firms in the private sector. These firms behave in an information environment based on conventions, meaning that a firm is likely to behave as its neighbors if it observes that their actions lead to a good pay off. On the other hand, we propose the use of reinforcement learning as a computational model for the role of the government in the economy, as the agent that determines the fiscal policy, and whose objective is to maximize the growth of the economy. We present the implementation of a simulator of the proposed model based on SWARM, that employs the SARSA(λ) algorithm combined with a multilayer perceptron as the function approximation for the action value function.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Darrerament, l'interès pel desenvolupament d'aplicacions amb robots submarins autònoms (AUV) ha crescut de forma considerable. Els AUVs són atractius gràcies al seu tamany i el fet que no necessiten un operador humà per pilotar-los. Tot i això, és impossible comparar, en termes d'eficiència i flexibilitat, l'habilitat d'un pilot humà amb les escasses capacitats operatives que ofereixen els AUVs actuals. L'utilització de AUVs per cobrir grans àrees implica resoldre problemes complexos, especialment si es desitja que el nostre robot reaccioni en temps real a canvis sobtats en les condicions de treball. Per aquestes raons, el desenvolupament de sistemes de control autònom amb l'objectiu de millorar aquestes capacitats ha esdevingut una prioritat. Aquesta tesi tracta sobre el problema de la presa de decisions utilizant AUVs. El treball presentat es centra en l'estudi, disseny i aplicació de comportaments per a AUVs utilitzant tècniques d'aprenentatge per reforç (RL). La contribució principal d'aquesta tesi consisteix en l'aplicació de diverses tècniques de RL per tal de millorar l'autonomia dels robots submarins, amb l'objectiu final de demostrar la viabilitat d'aquests algoritmes per aprendre tasques submarines autònomes en temps real. En RL, el robot intenta maximitzar un reforç escalar obtingut com a conseqüència de la seva interacció amb l'entorn. L'objectiu és trobar una política òptima que relaciona tots els estats possibles amb les accions a executar per a cada estat que maximitzen la suma de reforços totals. Així, aquesta tesi investiga principalment dues tipologies d'algoritmes basats en RL: mètodes basats en funcions de valor (VF) i mètodes basats en el gradient (PG). Els resultats experimentals finals mostren el robot submarí Ictineu en una tasca autònoma real de seguiment de cables submarins. Per portar-la a terme, s'ha dissenyat un algoritme anomenat mètode d'Actor i Crític (AC), fruit de la fusió de mètodes VF amb tècniques de PG.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses a study to test two methods of hearing screening for infants--visual reinforcement audiometry and auditory brainstem responses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

On-line learning methods have been applied successfully in multi-agent systems to achieve coordination among agents. Learning in multi-agent systems implies in a non-stationary scenario perceived by the agents, since the behavior of other agents may change as they simultaneously learn how to improve their actions. Non-stationary scenarios can be modeled as Markov Games, which can be solved using the Minimax-Q algorithm a combination of Q-learning (a Reinforcement Learning (RL) algorithm which directly learns an optimal control policy) and the Minimax algorithm. However, finding optimal control policies using any RL algorithm (Q-learning and Minimax-Q included) can be very time consuming. Trying to improve the learning time of Q-learning, we considered the QS-algorithm. in which a single experience can update more than a single action value by using a spreading function. In this paper, we contribute a Minimax-QS algorithm which combines the Minimax-Q algorithm and the QS-algorithm. We conduct a series of empirical evaluation of the algorithm in a simplified simulator of the soccer domain. We show that even using a very simple domain-dependent spreading function, the performance of the learning algorithm can be improved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Die vorliegende Arbeit beschäftigt sich mit der Entwicklung eines Funktionsapproximators und dessen Verwendung in Verfahren zum Lernen von diskreten und kontinuierlichen Aktionen: 1. Ein allgemeiner Funktionsapproximator – Locally Weighted Interpolating Growing Neural Gas (LWIGNG) – wird auf Basis eines Wachsenden Neuralen Gases (GNG) entwickelt. Die topologische Nachbarschaft in der Neuronenstruktur wird verwendet, um zwischen benachbarten Neuronen zu interpolieren und durch lokale Gewichtung die Approximation zu berechnen. Die Leistungsfähigkeit des Ansatzes, insbesondere in Hinsicht auf sich verändernde Zielfunktionen und sich verändernde Eingabeverteilungen, wird in verschiedenen Experimenten unter Beweis gestellt. 2. Zum Lernen diskreter Aktionen wird das LWIGNG-Verfahren mit Q-Learning zur Q-LWIGNG-Methode verbunden. Dafür muss der zugrunde liegende GNG-Algorithmus abgeändert werden, da die Eingabedaten beim Aktionenlernen eine bestimmte Reihenfolge haben. Q-LWIGNG erzielt sehr gute Ergebnisse beim Stabbalance- und beim Mountain-Car-Problem und gute Ergebnisse beim Acrobot-Problem. 3. Zum Lernen kontinuierlicher Aktionen wird ein REINFORCE-Algorithmus mit LWIGNG zur ReinforceGNG-Methode verbunden. Dabei wird eine Actor-Critic-Architektur eingesetzt, um aus zeitverzögerten Belohnungen zu lernen. LWIGNG approximiert sowohl die Zustands-Wertefunktion als auch die Politik, die in Form von situationsabhängigen Parametern einer Normalverteilung repräsentiert wird. ReinforceGNG wird erfolgreich zum Lernen von Bewegungen für einen simulierten 2-rädrigen Roboter eingesetzt, der einen rollenden Ball unter bestimmten Bedingungen abfangen soll.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

En este trabajo se han analizado varios problemas en el contexto de la elasticidad no lineal basándose en modelos constitutivos representativos. En particular, se han analizado problemas relacionados con el fenómeno de perdida de estabilidad asociada con condiciones de contorno en el caso de material reforzados con fibras. Cada problema se ha formulado y se ha analizado por separado en diferentes capítulos. En primer lugar se ha mostrado el análisis del gradiente de deformación discontinuo para un material transversalmente isótropo, en particular, el modelo del material considerado consiste de una base neo-Hookeana isótropa incrustada con fibras de refuerzo direccional caracterizadas con un solo parámetro. La solución de este problema se vincula con instabilidades que dan lugar al mecanismo de fallo conocido como banda de cortante. La perdida de elipticidad de las ecuaciones diferenciales de equilibrio es una condición necesaria para que aparezca este tipo de soluciones y por tanto las inestabilidades asociadas. En segundo lugar se ha analizado una deformación combinada de extensión, inación y torsión de un tubo cilíndrico grueso donde se ha encontrado que la deformación citada anteriormente puede ser controlada solo para determinadas direcciones de las fibras refuerzo. Para entender el comportamiento elástico del tubo considerado se ha ilustrado numéricamente los resultados obtenidos para las direcciones admisibles de las fibras de refuerzo bajo la deformación considerada. En tercer lugar se ha estudiado el caso de un tubo cilíndrico grueso reforzado con dos familias de fibras sometido a cortante en la dirección azimutal para un modelo de refuerzo especial. En este problema se ha encontrado que las inestabilidades que aparecen en el material considerado están asociadas con lo que se llama soluciones múltiples de la ecuación diferencial de equilibrio. Se ha encontrado que el fenómeno de instabilidad ocurre en un estado de deformación previo al estado de deformación donde se pierde la elipticidad de la ecuación diferencial de equilibrio. También se ha demostrado que la condición de perdida de elipticidad y ^W=2 = 0 (la segunda derivada de la función de energía con respecto a la deformación) son dos condiciones necesarias para la existencia de soluciones múltiples. Finalmente, se ha analizado detalladamente en el contexto de elipticidad un problema de un tubo cilíndrico grueso sometido a una deformación combinada en las direcciones helicoidal, axial y radial para distintas geotermias de las fibras de refuerzo . In the present work four main problems have been addressed within the framework of non-linear elasticity based on representative constitutive models. Namely, problems related to the loss of stability phenomena associated with boundary value problems for fibre-reinforced materials. Each of the considered problems is formulated and analysed separately in different chapters. We first start with the analysis of discontinuous deformation gradients for a transversely isotropic material under plane deformation. In particular, the material model is an augmented neo-Hookean base with a simple unidirectional reinforcement characterised by a single parameter. The solution of this problem is related to material instabilities and it is associated with a shear band-type failure mode. The loss of ellipticity of the governing differential equations is a necessary condition for the existence of these material instabilities. The second problem involves a detailed analysis of the combined non-linear extension, inflation and torsion of a thick-walled circular cylindrical tube where it has been found that the aforementioned deformation is controllable only for certain preferred directions of transverse isotropy. Numerical results have been illustrated to understand the elastic behaviour of the tube for the admissible preferred directions under the considered deformation. The third problem deals with the analysis of a doubly fibre-reinforced thickwalled circular cylindrical tube undergoing pure azimuthal shear for a special class of the reinforcing model where multiple non-smooth solutions emerge. The associated instability phenomena are found to occur prior to the point where the nominal stress tensor changes monotonicity in a particular direction. It has been also shown that the loss of ellipticity condition that arises from the equilibrium equation and ^W=2 = 0 (the second derivative of the strain-energy function with respect to the deformation) are equivalent necessary conditions for the emergence of multiple solutions for the considered material. Finally, a detailed analysis in the basis of the loss of ellipticity of the governing differential equations for a combined helical, axial and radial elastic deformations of a fibre-reinforced circular cylindrical tube is carried out.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To solve multi-objective problems, multiple reward signals are often scalarized into a single value and further processed using established single-objective problem solving techniques. While the field of multi-objective optimization has made many advances in applying scalarization techniques to obtain good solution trade-offs, the utility of applying these techniques in the multi-objective multi-agent learning domain has not yet been thoroughly investigated. Agents learn the value of their decisions by linearly scalarizing their reward signals at the local level, while acceptable system wide behaviour results. However, the non-linear relationship between weighting parameters of the scalarization function and the learned policy makes the discovery of system wide trade-offs time consuming. Our first contribution is a thorough analysis of well known scalarization schemes within the multi-objective multi-agent reinforcement learning setup. The analysed approaches intelligently explore the weight-space in order to find a wider range of system trade-offs. In our second contribution, we propose a novel adaptive weight algorithm which interacts with the underlying local multi-objective solvers and allows for a better coverage of the Pareto front. Our third contribution is the experimental validation of our approach by learning bi-objective policies in self-organising smart camera networks. We note that our algorithm (i) explores the objective space faster on many problem instances, (ii) obtained solutions that exhibit a larger hypervolume, while (iii) acquiring a greater spread in the objective space.