984 resultados para ACTION SELECTION
Resumo:
Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike latency codes. The multi-valued and continuous-valued features in the postsynaptic code allow for a generalization of binary decision making to multi-valued decision making and continuous-valued action selection. We show that code-specific learning rules speed up learning both for the discrete classification and the continuous regression tasks. The suggested learning rules also speed up with increasing population size as opposed to standard reinforcement learning rules. Continuous action selection is further shown to explain realistic learning speeds in the Morris water maze. Finally, we introduce the concept of action perturbation as opposed to the classical weight- or node-perturbation as an exploration mechanism underlying reinforcement learning. Exploration in the action space greatly increases the speed of learning as compared to exploration in the neuron or weight space.
Resumo:
Phenomenal states are generally considered the ultimate sources of intrinsic motivation for autonomous biological agents. In this article, we will address the issue of the necessity of exploiting these states for the design and implementation of robust goal-directed artificial systems. We will provide an analysis of consciousness in terms of a precise definition of how an agent "understands" the informational flows entering the agent and its very own action possibilities. This abstract model of consciousness and understanding will be based in the analysis and evaluation of phenomenal states along potential future trajectories in the state space of the agents. This implies that a potential strategy to follow in order to build autonomous but still customer-useful systems is to embed them with the particular, ad hoc phenomenality that captures the system-external requirements that define the system usefulness from a customer-based, requirements-strict engineering viewpoint.
Resumo:
Action selection and organization are very complex processes that need to exploit contextual information and the retrieval of previously memorized information, as well as the integration of these different types of data. On the basis of anatomical connection with premotor and parietal areas involved in action goal coding, and on the data about the literature it seems appropriate to suppose that one of the most candidate involved in the selection of neuronal pools for the selection and organization of intentional actions is the prefrontal cortex. We recorded single ventrolateral prefrontal (VLPF) neurons activity while monkeys performed simple and complex manipulative actions aimed at distinct final goals, by employing a modified and more strictly controlled version of the grasp-to-eat(a food pellet)/grasp-to-place(an object) paradigm used in previous studies on parietal (Fogassi et al., 2005) and premotor neurons (Bonini et al., 2010). With this task we have been able both to evaluate the processing and integration of distinct (visual and auditory) contextual sequentially presented information in order to select the forthcoming action to perform and to examine the possible presence of goal-related activity in this portion of cortex. Moreover, we performed an observation task to clarify the possible contribution of VLPF neurons to the understanding of others’ goal-directed actions. Simple Visuo Motor Task (sVMT). We found four main types of neurons: unimodal sensory-driven, motor-related, unimodal sensory-and-motor, and multisensory neurons. We found a substantial number of VLPF neurons showing both a motor-related discharge and a visual presentation response (sensory-and-motor neurons), with remarkable visuo-motor congruence for the preferred target. Interestingly the discharge of multisensory neurons reflected a behavioural decision independently from the sensory modality of the stimulus allowing the monkey to make it: some encoded a decision to act/refraining from acting (the majority), while others specified one among the four behavioural alternatives. Complex Visuo Motor Task (cVMT). The cVMT was similar to the sVMT, but included a further grasping motor act (grasping a lid in order to remove it, before grasping the target) and was run in two modalities: randomized and in blocks. Substantially, motor-related and sensory-and-motor neurons tested in the cVMTrandomized were activated already during the first grasping motor act, but the selectivity for one of the two graspable targets emerged only during the execution of the second grasping. In contrast, when the cVMT was run in block, almost all these neurons not only discharged during the first grasping motor act, but also displayed the same target selectivity showed in correspondence of the hand contact with the target. Observation Task (OT). A great part of the neurons active during the OT showed a firing rate modulation in correspondence with the action performed by the experimenter. Among them, we found neurons significantly activated during the observation of the experimenter’s action (action observation-related neurons) and neurons responding not only to the action observation, but also to the presented cue stimuli (sensory-and-action observation-related neurons. Among the neurons of the first set, almost the half displayed a target selectivity, with a not clear difference between the two presented targets; Concerning to the second neuronal set, sensory-and-action related neurons, we found a low target selectivity and a not strictly congruence between the selectivity exhibited in the visual response and in the action observation.
Resumo:
We examined the relations between selection for perception and selection for action in a patient FK, with bilateral damage to his temporal and medial frontal cortices. The task required a simple grasp response to a common object (a cup) in the presence of a distractor (another cup). The target was cued by colour or location, and FK made manual responses. We examined the effects on performance of cued and uncued dimensions of both the target and the distractor. FK was impaired at perceptually selecting the target when cued by colour, when the target colour but not its location changed on successive trials. The effect was sensitive to the relative orientations of targets and distractors, indicating an effect of action selection on perceptual selection, when perceptual selection was weakly instantiated. The dimension-specific carry-over effect on reaching was enhanced when there was a temporal delay between a cue and the response, and it disappeared when there was a between-trial delay. The results indicate that perceptual and action selection systems interact to determine the efficiency with which actions are selected to particular objects.
Resumo:
Résumé : La capacité de décider parmi plusieurs possibilités d'actions, grâce à l'information sensorielle disponible, est essentielle à un organisme interagissant avec un environnement complexe. Les modèles actuels de sélection d'une action soutiennent que le cerveau traite continuellement l'information sensorielle afin de planifier plusieurs possibilités d'action en parallèle. Dans cette perspective, ces représentations motrices, associées à chaque possibilité d'action, sont en constante compétition entre elles. Afin qu'une alternative puisse être sélectionnée pour le mouvement, une valeur de pondération, intégrant une multitude de facteurs, doit être associée à chacun des plans moteurs afin de venir moduler la compétition. Plusieurs études se sont intéressées aux différents facteurs modulant la sélection de l'action, tels que la disposition de l'environnement, le coût des actions, le niveau de récompense, etc. Par contre, il semble qu'aucune étude n'ait rapporté ce qu'il advient lorsque la valeur de pondération de chacune des actions possibles est identique. Dans ce contexte, quel est l'élément permettant de venir moduler la sélection de l'action? De ce fait, l'objectif principal de mon projet de maitrise est d'investiguer le facteur permettant au cerveau de sélectionner une action lorsque tous les facteurs rapportés dans la littérature sont contrôlés. De récentes données ont montré que les oscillations corticales lentes dans la bande delta peuvent servir d'instrument de sélection attentionnelle en modulant l'amplitude de la réponse neuronale. Ainsi, les stimuli arrivant dans le cortex pendant une phase en delta de forte excitabilité sont amplifiés, tandis que ceux arrivant lors d'une phase en delta de faible excitabilité sont atténués. Ceci dit, il est possible que la phase en delta dans laquelle se trouve le cerveau au moment d'effectuer la sélection d'une action puisse influencer la décision. Utilisant une tâche de sélection de main, cette étude teste l'hypothèse que la sélection de la main est associée à la phase en delta des ensembles neuronaux codant le mouvement de chacune des mains, lorsque tous les facteurs connus influençant la décision sont contrôlés. L'électroencéphalographie (EEG) fut utilisée afin d'enregistrer les signaux corticaux pendant que les participants effectuaient une tâche de sélection de main dans laquelle ils devaient, à chaque essai, atteindre une cible visuelle aussi rapidement que possible en utilisant la main de leur choix. La tâche fut conçue de façon à ce que les facteurs spatiaux et biomécaniques soient contrôlés. Ceci fut réalisé enidentifiant premièrement, sur une base individuelle, l'emplacement de la cible pour laquelle les mains droite et gauche avaient une probabilité équivalente d'être choisies (point d'égalité subjective, PSE). Ensuite, dans l'expérience principale, les participants effectuaient plusieurs mouvements d'atteinte vers des cibles positionnées près et loin du PSE, toujours avec la main de leur choix. L'utilisation de cinq cibles très près du PSE a permis de collecter de nombreux essais dans lesquels la main droite et la main gauche furent sélectionnées en réponse à un même stimulus visuel. Ceci a ainsi permis d'analyser les signaux des deux cortex dans des conditions d'utilisation de la main droite et gauche, tout en contrôlant pour les autres facteurs pouvant moduler la sélection de la main. Les résultats de cette recherche révèlent que l'hémisphère cortical se trouvant dans la phase la plus excitable en delta (près du pic négatif), lors de l'apparition du stimulus, est associé à la fois à la main qui sera sélectionnée ainsi qu'au temps de réaction. Ces résultats montrent que l'excitabilité corticale momentanée (phase du signal) pourrait agir comme un facteur modulant la sélection d'une action. Dans cette optique, ces données élargissent considérablement les modèles actuels en montrant que la sélection d'une action est en partie déterminée par l'état du cerveau au moment d'effectuer un choix, d'une manière qui est indépendante de toutes les variables de décision connues.
Resumo:
This paper describes an application of decoupled probabilistic world modeling to achieve team planning. The research is based on the principle that the action selection mechanism of a member in a robot team can select an effective action if a global world model is available to all team members. In the real world, the sensors are imprecise, and are individual to each robot, hence providing each robot a partial and unique view about the environment. We address this problem by creating a probabilistic global view on each agent by combining the perceptual information from each robot. This probabilistic view forms the basis for selecting actions to achieve the team goal in a dynamic environment. Experiments have been carried out to investigate the effectiveness of this principle using custom-built robots for real world performance, in addition, to extensive simulation results. The results show an improvement in team effectiveness when using probabilistic world modeling based on perception sharing for team planning.
Resumo:
Autonomous development of sensorimotor coordination enables a robot to adapt and change its action choices to interact with the world throughout its lifetime. The Experience Network is a structure that rapidly learns coordination between visual and haptic inputs and motor action. This paper presents methods which handle the high dimensionality of the network state-space which occurs due to the simultaneous detection of multiple sensory features. The methods provide no significant increase in the complexity of the underlying representations and also allow emergent, task-specific, semantic information to inform action selection. Experimental results show rapid learning in a real robot, beginning with no sensorimotor mappings, to a mobile robot capable of wall avoidance and target acquisition.
Resumo:
This thesis presents a novel framework for state estimation in the context of robotic grasping and manipulation. The overall estimation approach is based on fusing various visual cues for manipulator tracking, namely appearance and feature-based, shape-based, and silhouette-based visual cues. Similarly, a framework is developed to fuse the above visual cues, but also kinesthetic cues such as force-torque and tactile measurements, for in-hand object pose estimation. The cues are extracted from multiple sensor modalities and are fused in a variety of Kalman filters.
A hybrid estimator is developed to estimate both a continuous state (robot and object states) and discrete states, called contact modes, which specify how each finger contacts a particular object surface. A static multiple model estimator is used to compute and maintain this mode probability. The thesis also develops an estimation framework for estimating model parameters associated with object grasping. Dual and joint state-parameter estimation is explored for parameter estimation of a grasped object's mass and center of mass. Experimental results demonstrate simultaneous object localization and center of mass estimation.
Dual-arm estimation is developed for two arm robotic manipulation tasks. Two types of filters are explored; the first is an augmented filter that contains both arms in the state vector while the second runs two filters in parallel, one for each arm. These two frameworks and their performance is compared in a dual-arm task of removing a wheel from a hub.
This thesis also presents a new method for action selection involving touch. This next best touch method selects an available action for interacting with an object that will gain the most information. The algorithm employs information theory to compute an information gain metric that is based on a probabilistic belief suitable for the task. An estimation framework is used to maintain this belief over time. Kinesthetic measurements such as contact and tactile measurements are used to update the state belief after every interactive action. Simulation and experimental results are demonstrated using next best touch for object localization, specifically a door handle on a door. The next best touch theory is extended for model parameter determination. Since many objects within a particular object category share the same rough shape, principle component analysis may be used to parametrize the object mesh models. These parameters can be estimated using the action selection technique that selects the touching action which best both localizes and estimates these parameters. Simulation results are then presented involving localizing and determining a parameter of a screwdriver.
Lastly, the next best touch theory is further extended to model classes. Instead of estimating parameters, object class determination is incorporated into the information gain metric calculation. The best touching action is selected in order to best discern between the possible model classes. Simulation results are presented to validate the theory.
Resumo:
These studies explore how, where, and when representations of variables critical to decision-making are represented in the brain. In order to produce a decision, humans must first determine the relevant stimuli, actions, and possible outcomes before applying an algorithm that will select an action from those available. When choosing amongst alternative stimuli, the framework of value-based decision-making proposes that values are assigned to the stimuli and that these values are then compared in an abstract “value space” in order to produce a decision. Despite much progress, in particular regarding the pinpointing of ventromedial prefrontal cortex (vmPFC) as a region that encodes the value, many basic questions remain. In Chapter 2, I show that distributed BOLD signaling in vmPFC represents the value of stimuli under consideration in a manner that is independent of the type of stimulus it is. Thus the open question of whether value is represented in abstraction, a key tenet of value-based decision-making, is confirmed. However, I also show that stimulus-dependent value representations are also present in the brain during decision-making and suggest a potential neural pathway for stimulus-to-value transformations that integrates these two results.
More broadly speaking, there is both neural and behavioral evidence that two distinct control systems are at work during action selection. These two systems compose the “goal-directed system”, which selects actions based on an internal model of the environment, and the “habitual” system, which generates responses based on antecedent stimuli only. Computational characterizations of these two systems imply that they have different informational requirements in terms of input stimuli, actions, and possible outcomes. Associative learning theory predicts that the habitual system should utilize stimulus and action information only, while goal-directed behavior requires that outcomes as well as stimuli and actions be processed. In Chapter 3, I test whether areas of the brain hypothesized to be involved in habitual versus goal-directed control represent the corresponding theorized variables.
The question of whether one or both of these neural systems drives Pavlovian conditioning is less well-studied. Chapter 4 describes an experiment in which subjects were scanned while engaged in a Pavlovian task with a simple non-trivial structure. After comparing a variety of model-based and model-free learning algorithms (thought to underpin goal-directed and habitual decision-making, respectively), it was found that subjects’ reaction times were better explained by a model-based system. In addition, neural signaling of precision, a variable based on a representation of a world model, was found in the amygdala. These data indicate that the influence of model-based representations of the environment can extend even to the most basic learning processes.
Knowledge of the state of hidden variables in an environment is required for optimal inference regarding the abstract decision structure of a given environment and therefore can be crucial to decision-making in a wide range of situations. Inferring the state of an abstract variable requires the generation and manipulation of an internal representation of beliefs over the values of the hidden variable. In Chapter 5, I describe behavioral and neural results regarding the learning strategies employed by human subjects in a hierarchical state-estimation task. In particular, a comprehensive model fit and comparison process pointed to the use of "belief thresholding". This implies that subjects tended to eliminate low-probability hypotheses regarding the state of the environment from their internal model and ceased to update the corresponding variables. Thus, in concert with incremental Bayesian learning, humans explicitly manipulate their internal model of the generative process during hierarchical inference consistent with a serial hypothesis testing strategy.
Resumo:
Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent's local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.
Resumo:
Decision making in an uncertain environment poses a conflict between the opposing demands of gathering and exploiting information. In a classic illustration of this 'exploration-exploitation' dilemma, a gambler choosing between multiple slot machines balances the desire to select what seems, on the basis of accumulated experience, the richest option, against the desire to choose a less familiar option that might turn out more advantageous (and thereby provide information for improving future decisions). Far from representing idle curiosity, such exploration is often critical for organisms to discover how best to harvest resources such as food and water. In appetitive choice, substantial experimental evidence, underpinned by computational reinforcement learning (RL) theory, indicates that a dopaminergic, striatal and medial prefrontal network mediates learning to exploit. In contrast, although exploration has been well studied from both theoretical and ethological perspectives, its neural substrates are much less clear. Here we show, in a gambling task, that human subjects' choices can be characterized by a computationally well-regarded strategy for addressing the explore/exploit dilemma. Furthermore, using this characterization to classify decisions as exploratory or exploitative, we employ functional magnetic resonance imaging to show that the frontopolar cortex and intraparietal sulcus are preferentially active during exploratory decisions. In contrast, regions of striatum and ventromedial prefrontal cortex exhibit activity characteristic of an involvement in value-based exploitative decision making. The results suggest a model of action selection under uncertainty that involves switching between exploratory and exploitative behavioural modes, and provide a computationally precise characterization of the contribution of key decision-related brain systems to each of these functions.
Resumo:
We consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov [1972] and MacKay [1992], we apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network learner. We demonstrate that these techniques allow the learner to minimize its generalization error by exploring its domain efficiently and completely. We conclude that, while not a panacea, OED-based query/action has much to offer, especially in domains where its high computational costs can be tolerated.
Resumo:
Before choosing, it helps to know both the expected value signaled by a predictive cue and the associated uncertainty that the reward will be forthcoming. Recently, Fiorillo et al. (2003) found the dopamine (DA) neurons of the SNc exhibit sustained responses related to the uncertainty that a cure will be followed by reward, in addition to phasic responses related to reward prediction errors (RPEs). This suggests that cue-dependent anticipations of the timing, magnitude, and uncertainty of rewards are learned and reflected in components of the DA signals broadcast by SNc neurons. What is the minimal local circuit model that can explain such multifaceted reward-related learning? A new computational model shows how learned uncertainty responses emerge robustly on single trial along with phasic RPE responses, such that both types of DA responses exhibit the empirically observed dependence on conditional probability, expected value of reward, and time since onset of the reward-predicting cue. The model includes three major pathways for computing: immediate expected values of cures, timed predictions of reward magnitudes (and RPEs), and the uncertainty associated with these predictions. The first two model pathways refine those previously modeled by Brown et al. (1999). A third, newly modeled, pathway is formed by medium spiny projection neurons (MSPNs) of the matrix compartment of the striatum, whose axons co-release GABA and a neuropeptide, substance P, both at synapses with GABAergic neurons in the SNr and with the dendrites (in SNr) of DA neurons whose somas are in ventral SNc. Co-release enables efficient computation of sustained DA uncertainty responses that are a non-monotonic function of the conditonal probability that a reward will follow the cue. The new model's incorporation of a striatal microcircuit allowed it to reveals that variability in striatal cholinergic transmission can explain observed difference, between monkeys, in the amplitutude of the non-monotonic uncertainty function. Involvement of matriceal MSPNs and striatal cholinergic transmission implpies a relation between uncertainty in the cue-reward contigency and action-selection functions of the basal ganglia. The model synthesizes anatomical, electrophysiological and behavioral data regarding the midbrain DA system in a novel way, by relating the ability to compute uncertainty, in parallel with other aspects of reward contingencies, to the unique distribution of SP inputs in ventral SN.
Resumo:
This study used a virtual simulated 3vs3 rugby task to investigate whether gaps opening in particular running channels promote different actions by the ball-carrier player and whether an effect of rugby expertise is verified. We manipulated emergent gaps in three different locations: gap1 in the participant’s own running channel, gap 2 in the 1st receiver's running channel, and gap3 in the 2nd receiver's running channel. Recreational, intermediate, professional and non-rugby players performed the task. They could i) run with the ball, ii) make a short pass, or iii) make a long pass. All actions were digitally recorded. Results revealed that the emergence of gaps in the defensive line with respect to the participant’s own position significantly influenced action selection. Namely, ‘run’ was most often the action performed in gap 1, ‘short pass’ in gap 2, and ‘long pass’ in gap 3 trials. Furthermore, a strong positive relationship between expertise and task achievement was found.