83 resultados para Credit limits
em BORIS: Bern Open Repository and Information System - Berna - Suiça
Resumo:
Learning by reinforcement is important in shaping animal behavior, and in particular in behavioral decision making. Such decision making is likely to involve the integration of many synaptic events in space and time. However, using a single reinforcement signal to modulate synaptic plasticity, as suggested in classical reinforcement learning algorithms, a twofold problem arises. Different synapses will have contributed differently to the behavioral decision, and even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike-time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward, but also by a population feedback signal. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference (TD) based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task, the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second task involves an action sequence which is itself extended in time and reward is only delivered at the last action, as it is the case in any type of board-game. The third task is the inspection game that has been studied in neuroeconomics, where an inspector tries to prevent a worker from shirking. Applying our algorithm to this game yields a learning behavior which is consistent with behavioral data from humans and monkeys, revealing themselves properties of a mixed Nash equilibrium. The examples show that our neuronal implementation of reward based learning copes with delayed and stochastic reward delivery, and also with the learning of mixed strategies in two-opponent games.
Resumo:
Learning by reinforcement is important in shaping animal behavior. But behavioral decision making is likely to involve the integration of many synaptic events in space and time. So in using a single reinforcement signal to modulate synaptic plasticity a twofold problem arises. Different synapses will have contributed differently to the behavioral decision and, even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward but by a population feedback signal as well. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second one involves an action sequence which is itself extended in time and reward is only delivered at the last action, as is the case in any type of board-game. The third is the inspection game that has been studied in neuroeconomics. It only has a mixed Nash equilibrium and exemplifies that the model also copes with stochastic reward delivery and the learning of mixed strategies.
Resumo:
We present a model for plasticity induction in reinforcement learning which is based on a cascade of synaptic memory traces. In the cascade of these so called eligibility traces presynaptic input is first corre lated with postsynaptic events, next with the behavioral decisions and finally with the external reinforcement. A population of leaky integrate and fire neurons endowed with this plasticity scheme is studied by simulation on different tasks. For operant co nditioning with delayed reinforcement, learning succeeds even when the delay is so large that the delivered reward reflects the appropriateness, not of the immediately preceeding response, but of a decision made earlier on in the stimulus - decision sequence . So the proposed model does not rely on the temporal contiguity between decision and pertinent reward and thus provides a viable means of addressing the temporal credit assignment problem. In the same task, learning speeds up with increasing population si ze, showing that the plasticity cascade simultaneously addresses the spatial problem of assigning credit to the different population neurons. Simulations on other task such as sequential decision making serve to highlight the robustness of the proposed sch eme and, further, contrast its performance to that of temporal difference based approaches to reinforcement learning.
Resumo:
n learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.
Resumo:
A major goal of antiretroviral therapy (ART) for HIV-1-infected persons is the recovery of CD4 T lymphocytes, resulting in thorough protection against opportunistic complications. Interruptions of ART are still frequent. The long-term effect on CD4 T-cell recovery and clinical events remains unknown.
Resumo:
Cellular immune responses during acute Hepatitis C virus (HCV) and HIV infection are a known correlate of infection outcome. Viral adaptation to these responses via mutation(s) within CD8+ T-cell epitopes allows these viruses to subvert host immune control. This study examined HCV evolution in 21 HCV genotype 1-infected subjects to characterise the level of viral adaptation during acute and early HCV infection. Of the total mutations observed 25% were within described CD8+ T-cell epitopes or at viral adaptation sites. Most mutations were maintained into the chronic phase of HCV infection (75%). The lack of reversion of adaptations and high proportion of silent substitutions suggests that HCV has structural and functional limitations that constrain evolution. These results were compared to the pattern of viral evolution observed in 98 subjects during a similar phase in HIV infection from a previous study. In contrast to HCV, evolution during acute HIV infection is marked by high levels of amino acid change relative to silent substitutions, including a higher proportion of adaptations, likely reflecting strong and continued CD8+ T-cell pressure combined with greater plasticity of the virus. Understanding viral escape dynamics for these two viruses is important for effective T cell vaccine design.
Resumo:
In order to achieve host cell entry, the apicomplexan parasite Neospora caninum relies on the contents of distinct organelles, named micronemes, rhoptries and dense granules, which are secreted at defined timepoints during and after host cell entry. It was shown previously that a vaccine composed of a mixture of three recombinant antigens, corresponding to the two microneme antigens NcMIC1 and NcMIC3 and the rhoptry protein NcROP2, prevented disease and limited cerebral infection and transplacental transmission in mice. In this study, we selected predicted immunogenic domains of each of these proteins and created four different chimeric antigens, with the respective domains incorporated into these chimers in different orders. Following vaccination, mice were challenged intraperitoneally with 2 × 10(6)N. caninum tachzyoites and were then carefully monitored for clinical symptoms during 4 weeks post-infection. Of the four chimeric antigens, only recNcMIC3-1-R provided complete protection against disease with 100% survivors, compared to 40-80% of survivors in the other groups. Serology did not show any clear differences in total IgG, IgG1 and IgG2a levels between the different treatment groups. Vaccination with all four chimeric variants generated an IL-4 biased cytokine expression, which then shifted to an IFN-γ-dominated response following experimental infection. Sera of recNcMIC3-1-R vaccinated mice reacted with each individual recombinant antigen, as well as with three distinct bands in Neospora extracts with similar Mr as NcMIC1, NcMIC3 and NcROP2, and exhibited distinct apical labeling in tachyzoites. These results suggest that recNcMIC3-1-R is an interesting chimeric vaccine candidate and should be followed up in subsequent studies in a fetal infection model.