7 resultados para GFRP reinforcement

em Boston University Digital Common


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent electrophysical data inspired the claim that dopaminergic neurons adapt their mismatch sensitivities to reflect variances of expected rewards. This contradicts reward prediction error theory and most basal ganglia models. Application of learning principles points to a testable alternative interpretation-of the same data-that is compatible with existing theory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A neural model is described of how adaptively timed reinforcement learning occurs. The adaptive timing circuit is suggested to exist in the hippocampus, and to involve convergence of dentate granule cells on CA3 pyramidal cells, and NMDA receptors. This circuit forms part of a model neural system for the coordinated control of recognition learning, reinforcement learning, and motor learning, whose properties clarify how an animal can learn to acquire a delayed reward. Behavioral and neural data are summarized in support of each processing stage of the system. The relevant anatomical sites are in thalamus, neocortex, hippocampus, hypothalamus, amygdala, and cerebellum. Cerebellar influences on motor learning are distinguished from hippocampal influences on adaptive timing of reinforcement learning. The model simulates how damage to the hippocampal formation disrupts adaptive timing, eliminates attentional blocking, and causes symptoms of medial temporal amnesia. It suggests how normal acquisition of subcortical emotional conditioning can occur after cortical ablation, even though extinction of emotional conditioning is retarded by cortical ablation. The model simulates how increasing the duration of an unconditioned stimulus increases the amplitude of emotional conditioning, but does not change adaptive timing; and how an increase in the intensity of a conditioned stimulus "speeds up the clock", but an increase in the intensity of an unconditioned stimulus does not. Computer simulations of the model fit parametric conditioning data, including a Weber law property and an inverted U property. Both primary and secondary adaptively timed conditioning are simulated, as are data concerning conditioning using multiple interstimulus intervals (ISIs), gradually or abruptly changing ISis, partial reinforcement, and multiple stimuli that lead to time-averaging of responses. Neurobiologically testable predictions are made to facilitate further tests of the model.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Both animals and mobile robots, or animats, need adaptive control systems to guide their movements through a novel environment. Such control systems need reactive mechanisms for exploration, and learned plans to efficiently reach goal objects once the environment is familiar. How reactive and planned behaviors interact together in real time, and arc released at the appropriate times, during autonomous navigation remains a major unsolved problern. This work presents an end-to-end model to address this problem, named SOVEREIGN: A Self-Organizing, Vision, Expectation, Recognition, Emotion, Intelligent, Goal-oriented Navigation system. The model comprises several interacting subsystems, governed by systems of nonlinear differential equations. As the animat explores the environment, a vision module processes visual inputs using networks that arc sensitive to visual form and motion. Targets processed within the visual form system arc categorized by real-time incremental learning. Simultaneously, visual target position is computed with respect to the animat's body. Estimates of target position activate a motor system to initiate approach movements toward the target. Motion cues from animat locomotion can elicit orienting head or camera movements to bring a never target into view. Approach and orienting movements arc alternately performed during animat navigation. Cumulative estimates of each movement, based on both visual and proprioceptive cues, arc stored within a motor working memory. Sensory cues are stored in a parallel sensory working memory. These working memories trigger learning of sensory and motor sequence chunks, which together control planned movements. Effective chunk combinations arc selectively enhanced via reinforcement learning when the animat is rewarded. The planning chunks effect a gradual transition from reactive to planned behavior. The model can read-out different motor sequences under different motivational states and learns more efficient paths to rewarded goals as exploration proceeds. Several volitional signals automatically gate the interactions between model subsystems at appropriate times. A 3-D visual simulation environment reproduces the animat's sensory experiences as it moves through a simplified spatial environment. The SOVEREIGN model exhibits robust goal-oriented learning of sequential motor behaviors. Its biomimctic structure explicates a number of brain processes which are involved in spatial navigation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The "teaching signal" that modulates reinforcement learning at cortico-striatal synapses may be a sequence composed of an adaptively scaled DA burst, a brief ACh burst, and a scaled ACh pause. Such an interpretation is consistent with recent data on cholinergic interneurons of the striatum are tonically active neurons (TANs) that respond with characteristic pauses to novel events and to appetitive and aversive conditioned stimuli. Fluctuations in acetylcholine release by TANs modulate performance- and learning- related dynamics in the striatum. Whereas tonic activity emerges from intrinsic properties of these neurons, glutamatergic inputs from thalamic centromedian-parafascicular nuclei, and dopaminergic inputs from midbrain are required for the generation of pause responses. No prior computational models encompass both intrinsic and synaptically-gated dynamics. We present a mathematical model that robustly accounts for behavior-related electrophysiological properties of TANs in terms of their intrinsic physiological properties and known afferents. In the model balanced intrinsic hyperpolarizing and depolarizing currents engender tonic firing, and glutamatergic inputs from thalamus (and cortex) both directly excite and indirectly inhibit TANs. If the latter inhibition, probably mediated by GABAergic NOS interneurons, exceeds a threshold, its effect is amplified by a KIR current to generate a prolongued pause. In the model, the intrinsic mechanisms and external inputs are both modulated by learning-dependent dopamine (DA) signals and our simulations revealed that many learning-dependent behaviors of TANs are explicable without recourse to learning-dependent changes in synapses onto TANs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

How do reactive and planned behaviors interact in real time? How are sequences of such behaviors released at appropriate times during autonomous navigation to realize valued goals? Controllers for both animals and mobile robots, or animats, need reactive mechanisms for exploration, and learned plans to reach goal objects once an environment becomes familiar. The SOVEREIGN (Self-Organizing, Vision, Expectation, Recognition, Emotion, Intelligent, Goaloriented Navigation) animat model embodies these capabilities, and is tested in a 3D virtual reality environment. SOVEREIGN includes several interacting subsystems which model complementary properties of cortical What and Where processing streams and which clarify similarities between mechanisms for navigation and arm movement control. As the animat explores an environment, visual inputs are processed by networks that are sensitive to visual form and motion in the What and Where streams, respectively. Position-invariant and sizeinvariant recognition categories are learned by real-time incremental learning in the What stream. Estimates of target position relative to the animat are computed in the Where stream, and can activate approach movements toward the target. Motion cues from animat locomotion can elicit head-orienting movements to bring a new target into view. Approach and orienting movements are alternately performed during animat navigation. Cumulative estimates of each movement are derived from interacting proprioceptive and visual cues. Movement sequences are stored within a motor working memory. Sequences of visual categories are stored in a sensory working memory. These working memories trigger learning of sensory and motor sequence categories, or plans, which together control planned movements. Predictively effective chunk combinations are selectively enhanced via reinforcement learning when the animat is rewarded. Selected planning chunks effect a gradual transition from variable reactive exploratory movements to efficient goal-oriented planned movement sequences. Volitional signals gate interactions between model subsystems and the release of overt behaviors. The model can control different motor sequences under different motivational states and learns more efficient sequences to rewarded goals as exploration proceeds.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The concepts of declarative memory and procedural memory have been used to distinguish two basic types of learning. A neural network model suggests how such memory processes work together as recognition learning, reinforcement learning, and sensory-motor learning take place during adaptive behaviors. To coordinate these processes, the hippocampal formation and cerebellum each contain circuits that learn to adaptively time their outputs. Within the model, hippocampal timing helps to maintain attention on motivationally salient goal objects during variable task-related delays, and cerebellar timing controls the release of conditioned responses. This property is part of the model's description of how cognitive-emotional interactions focus attention on motivationally valued cues, and how this process breaks down due to hippocampal ablation. The model suggests that the hippocampal mechanisms that help to rapidly draw attention to salient cues could prematurely release motor commands were not the release of these commands adaptively timed by the cerebellum. The model hippocampal system modulates cortical recognition learning without actually encoding the representational information that the cortex encodes. These properties avoid the difficulties faced by several models that propose a direct hippocampal role in recognition learning. Learning within the model hippocampal system controls adaptive timing and spatial orientation. Model properties hereby clarify how hippocampal ablations cause amnesic symptoms and difficulties with tasks which combine task delays, novelty detection, and attention towards goal objects amid distractions. When these model recognition, reinforcement, sensory-motor, and timing processes work together, they suggest how the brain can accomplish conditioning of multiple sensory events to delayed rewards, as during serial compound conditioning.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The giant cholinergic interneurons of the striatum are tonically active neurons (TANs) that respond with characteristic pauses to novel events and to appetitive and aversive conditioned stimuli. Fluctuations in acetylcholine release by TANs modulate performance- and learning-related dynamics in the striatum. Whereas tonic activity emerges from intrinsic properties of these neurons, glutamatergic inputs from thalamic centromedian-parafascicular nuclei, and dopaminergic inputs from midbrain, are required for the generation of pause responses. No prior computational models encompass both intrinsic and synaptically-gated dynamics. We present a mathematical model that robustly accounts for behavior-related electrophysiological properties of TANs in terms of their intrinsic physiological properties and known afferents. In the model, balanced intrinsic hyperpolarizing and depolarizing currents engender tonic firing, and glutamatergic inputs from thalamus (and cortex) both directly excite and indirectly inhibit TANs. If the latter inhibition, presumably mediated by GABAergic interneurons, exceeds a threshold, its effect is amplified by a KIR current to generate a prolonged pause. In the model, the intrinsic mechanisms and external inputs are both modulated by learning-dependent dopamine (DA) signals and our simulations revealed that many learning-dependent behaviors of TANs are explicable without recourse to learning-dependent changes in synapses onto TANs. The "teaching signal" that modulates reinforcement learning at cortico-striatal synapses may be a sequence composed of an adaptively scaled DA burst, a brief ACh burst, and a scaled ACh pause. Such an interpretation is consistent with recent data on cholinergic control of LTD of cortical synapses onto striatal spiny projection neurons.