997 resultados para population reinforcement


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike latency codes. The multi-valued and continuous-valued features in the postsynaptic code allow for a generalization of binary decision making to multi-valued decision making and continuous-valued action selection. We show that code-specific learning rules speed up learning both for the discrete classification and the continuous regression tasks. The suggested learning rules also speed up with increasing population size as opposed to standard reinforcement learning rules. Continuous action selection is further shown to explain realistic learning speeds in the Morris water maze. Finally, we introduce the concept of action perturbation as opposed to the classical weight- or node-perturbation as an exploration mechanism underlying reinforcement learning. Exploration in the action space greatly increases the speed of learning as compared to exploration in the neuron or weight space.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La présente thèse avait pour but d’étudier les liens qui existaient entre la psychopathie du parent et les pratiques parentales utilisées. L’échantillon était composé de 65 parents francophones, hommes ou femmes, ayant au moins un enfant âgé entre 6 et 10 ans. Les parents ont été rencontrés à leur domicile, à l’école de leur enfant ou dans un organisme communautaire. Le Self Report Psychopathy Scale R12-III (Paulhus, Hemphill & Hare, sous presse) a été traduit en français pour la présente étude afin de mesurer la psychopathie du parent. La version francophone de l’Alabama Parenting Questionnaire (Pauzé & al., 2004) a été utilisée pour mesurer cinq pratiques parentales : les pratiques éducatives positives, le manque de supervision, l’engagement, la discipline incohérente et les punitions corporelles. La version francophone de l’échelle de désirabilité sociale abrégée de Marlowe-Crowe (Crowe-Marlowe, 1960) a été jointe aux deux autres questionnaires (Bergeron, Valla & Breton, 1992). Des régressions simples ont été effectuées entre le score global de psychopathie et chacune des cinq pratiques parentales énumérées ci-dessus. Ensuite, des régressions multiples ont été exécutées afin de vérifier quel était le meilleur facteur de la psychopathie pouvant prédire chaque pratique parentale. Les résultats ont montré que la psychopathie était associée négativement de façon significative aux pratiques éducatives positives et à l’engagement. Un lien significatif positif a été soulevé entre la psychopathie et l’utilisation des punitions corporelles. Les analyses secondaires ont démontré que le facteur interpersonnel de la psychopathie expliquait une proportion significative des pratiques éducatives positives. Le facteur antisocial a prédit, quant à lui, une petite partie significative de l’engagement au-delà de l’explication fournie par la désirabilité sociale. Le style de vie du psychopathe a contribué à une proportion significative de l’explication de la variance des punitions corporelles. Il semblerait pertinent d’intervenir le plus tôt possible auprès du parent et de l’enfant afin d’éviter que les mauvaises pratiques et les traits psychopathes ne se répètent dans les générations futures. Des méthodes d’intervention ont été suggérées. Les forces et les faiblesses de l’étude ont été discutées.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In an evolutionary model, players from a given population meet randomly in pairs each instant to play a coordination game. At each instant, the learning model used is determined via some replicator dynamics that respects payoff fitness. We allow for two such models: a belief-based best-response model that uses a costly predictor, and a costless reinforcement-based one. This generates dynamics over the choice of learning models and the consequent choices of endogenous variables. We report conditions under which the long run outcomes are efficient (or inefficient) and they support the exclusive use of either of the models (or their co-existence).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior, and in particular in behavioral decision making. Such decision making is likely to involve the integration of many synaptic events in space and time. However, using a single reinforcement signal to modulate synaptic plasticity, as suggested in classical reinforcement learning algorithms, a twofold problem arises. Different synapses will have contributed differently to the behavioral decision, and even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike-time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward, but also by a population feedback signal. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference (TD) based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task, the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second task involves an action sequence which is itself extended in time and reward is only delivered at the last action, as it is the case in any type of board-game. The third task is the inspection game that has been studied in neuroeconomics, where an inspector tries to prevent a worker from shirking. Applying our algorithm to this game yields a learning behavior which is consistent with behavioral data from humans and monkeys, revealing themselves properties of a mixed Nash equilibrium. The examples show that our neuronal implementation of reward based learning copes with delayed and stochastic reward delivery, and also with the learning of mixed strategies in two-opponent games.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior. But behavioral decision making is likely to involve the integration of many synaptic events in space and time. So in using a single reinforcement signal to modulate synaptic plasticity a twofold problem arises. Different synapses will have contributed differently to the behavioral decision and, even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward but by a population feedback signal as well. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second one involves an action sequence which is itself extended in time and reward is only delivered at the last action, as is the case in any type of board-game. The third is the inspection game that has been studied in neuroeconomics. It only has a mixed Nash equilibrium and exemplifies that the model also copes with stochastic reward delivery and the learning of mixed strategies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a model for plasticity induction in reinforcement learning which is based on a cascade of synaptic memory traces. In the cascade of these so called eligibility traces presynaptic input is first corre lated with postsynaptic events, next with the behavioral decisions and finally with the external reinforcement. A population of leaky integrate and fire neurons endowed with this plasticity scheme is studied by simulation on different tasks. For operant co nditioning with delayed reinforcement, learning succeeds even when the delay is so large that the delivered reward reflects the appropriateness, not of the immediately preceeding response, but of a decision made earlier on in the stimulus - decision sequence . So the proposed model does not rely on the temporal contiguity between decision and pertinent reward and thus provides a viable means of addressing the temporal credit assignment problem. In the same task, learning speeds up with increasing population si ze, showing that the plasticity cascade simultaneously addresses the spatial problem of assigning credit to the different population neurons. Simulations on other task such as sequential decision making serve to highlight the robustness of the proposed sch eme and, further, contrast its performance to that of temporal difference based approaches to reinforcement learning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

n learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Total hip arthroplasty (THA) still carries a higher failure rate in patients with avascular necrosis of the femoral head (AVN) than in a similar patient population with THA for other reasons. This is particularly true for the acetabular component. One of the major factors accounting for this is the compromised acetabular bone quality with structural defects subsequent to collapsing of the femoral head in high-grade AVN. In this study we implanted an acetabular reinforcement ring with hook (ARRH), which had been used successfully for other indications with acetabular bone stock deficiency, in 32 consecutive THA's in 29 patients with AVN. Five patients died during the observation period of causes unrelated to the surgery, one patient was lost to follow-up and one patient could not be followed up due to chronic illness, leaving 25 hips (23 patients) with a minimum follow-up of ten years (mean: 11.8; range: 10-15). The mean Merle d'Aubigne score increased significantly from 7.7 preoperatively to 16.6 postoperatively (p < 0.001). One revision was performed for aseptic stem loosening. Of the unrevised hips, one acetabular component was classified as definitively loose. The cumulative 12-year survivorship for THA with ARRH in AVN was 95.2% (confidence interval: 86.1-100%) for both components, 100% for the cup and 95.2% for the stem (86.1-100%).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Artificial pancreas is in the forefront of research towards the automatic insulin infusion for patients with type 1 diabetes. Due to the high inter- and intra-variability of the diabetic population, the need for personalized approaches has been raised. This study presents an adaptive, patient-specific control strategy for glucose regulation based on reinforcement learning and more specifically on the Actor-Critic (AC) learning approach. The control algorithm provides daily updates of the basal rate and insulin-to-carbohydrate (IC) ratio in order to optimize glucose regulation. A method for the automatic and personalized initialization of the control algorithm is designed based on the estimation of the transfer entropy (TE) between insulin and glucose signals. The algorithm has been evaluated in silico in adults, adolescents and children for 10 days. Three scenarios of initialization to i) zero values, ii) random values and iii) TE-based values have been comparatively assessed. The results have shown that when the TE-based initialization is used, the algorithm achieves faster learning with 98%, 90% and 73% in the A+B zones of the Control Variability Grid Analysis for adults, adolescents and children respectively after five days compared to 95%, 78%, 41% for random initialization and 93%, 88%, 41% for zero initial values. Furthermore, in the case of children, the daily Low Blood Glucose Index reduces much faster when the TE-based tuning is applied. The results imply that automatic and personalized tuning based on TE reduces the learning period and improves the overall performance of the AC algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Allopatric speciation results from geographic isolation between populations. In the absence of gene flow, reproductive isolation arises gradually and incidentally as a result of mutation, genetic drift and the indirect effects of natural selection driving local adaptation(1-3). In contrast, speciation by reinforcement is driven directly by natural selection against maladaptive hybridization(1,4). This gives individuals that choose the traits of their own lineage greater fitness, potentially leading to rapid speciation between the lineages(1,4). Reinforcing natural selection on a population of one of the lineages in a mosaic contact zone could also result in divergence of the population from the allopatric range of its own lineage outside the zone(4-6). Here we test this with molecular data, experimental crosses, field measurements and mate choice experiments in a mosaic contact zone between two lineages of a rainforest frog. We show that reinforcing natural selection has resulted in significant premating isolation of a population in the contact zone not only from the other lineage but also, incidentally, from the closely related main range of its own lineage. Thus we show the potential for reinforcement to drive rapid allopatric speciation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Previous research has found accumulating evidence for atypical reward processing in autism spectrum disorders (ASD), particularly in the context of social rewards. Yet, this line of research has focused largely on positive social reinforcement, while little is known about the processing of negative reinforcement in individuals with ASD. METHODS: The present study examined neural responses to social negative reinforcement (a face displaying negative affect) and non-social negative reinforcement (monetary loss) in children with ASD relative to typically developing children, using functional magnetic resonance imaging (fMRI). RESULTS: We found that children with ASD demonstrated hypoactivation of the right caudate nucleus while anticipating non-social negative reinforcement and hypoactivation of a network of frontostriatal regions (including the nucleus accumbens, caudate nucleus, and putamen) while anticipating social negative reinforcement. In addition, activation of the right caudate nucleus during non-social negative reinforcement was associated with individual differences in social motivation. CONCLUSIONS: These results suggest that atypical responding to negative reinforcement in children with ASD may contribute to social motivational deficits in this population.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

That humans and animals learn from interaction with the environment is a foundational idea underlying nearly all theories of learning and intelligence. Learning that certain outcomes are associated with specific actions or stimuli (both internal and external), is at the very core of the capacity to adapt behaviour to environmental changes. In the present work, appetitive and aversive reinforcement learning paradigms have been used to investigate the fronto-striatal loops and behavioural correlates of adaptive and maladaptive reinforcement learning processes, aiming to a deeper understanding of how cortical and subcortical substrates interacts between them and with other brain systems to support learning. By combining a large variety of neuroscientific approaches, including behavioral and psychophysiological methods, EEG and neuroimaging techniques, these studies aim at clarifying and advancing the knowledge of the neural bases and computational mechanisms of reinforcement learning, both in normal and neurologically impaired population.