87 resultados para Kabatek, Johannes
Resumo:
Learning by reinforcement is important in shaping animal behavior, and in particular in behavioral decision making. Such decision making is likely to involve the integration of many synaptic events in space and time. However, using a single reinforcement signal to modulate synaptic plasticity, as suggested in classical reinforcement learning algorithms, a twofold problem arises. Different synapses will have contributed differently to the behavioral decision, and even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike-time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward, but also by a population feedback signal. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference (TD) based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task, the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second task involves an action sequence which is itself extended in time and reward is only delivered at the last action, as it is the case in any type of board-game. The third task is the inspection game that has been studied in neuroeconomics, where an inspector tries to prevent a worker from shirking. Applying our algorithm to this game yields a learning behavior which is consistent with behavioral data from humans and monkeys, revealing themselves properties of a mixed Nash equilibrium. The examples show that our neuronal implementation of reward based learning copes with delayed and stochastic reward delivery, and also with the learning of mixed strategies in two-opponent games.
Resumo:
Learning by reinforcement is important in shaping animal behavior. But behavioral decision making is likely to involve the integration of many synaptic events in space and time. So in using a single reinforcement signal to modulate synaptic plasticity a twofold problem arises. Different synapses will have contributed differently to the behavioral decision and, even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward but by a population feedback signal as well. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second one involves an action sequence which is itself extended in time and reward is only delivered at the last action, as is the case in any type of board-game. The third is the inspection game that has been studied in neuroeconomics. It only has a mixed Nash equilibrium and exemplifies that the model also copes with stochastic reward delivery and the learning of mixed strategies.
Resumo:
We present a model for plasticity induction in reinforcement learning which is based on a cascade of synaptic memory traces. In the cascade of these so called eligibility traces presynaptic input is first corre lated with postsynaptic events, next with the behavioral decisions and finally with the external reinforcement. A population of leaky integrate and fire neurons endowed with this plasticity scheme is studied by simulation on different tasks. For operant co nditioning with delayed reinforcement, learning succeeds even when the delay is so large that the delivered reward reflects the appropriateness, not of the immediately preceeding response, but of a decision made earlier on in the stimulus - decision sequence . So the proposed model does not rely on the temporal contiguity between decision and pertinent reward and thus provides a viable means of addressing the temporal credit assignment problem. In the same task, learning speeds up with increasing population si ze, showing that the plasticity cascade simultaneously addresses the spatial problem of assigning credit to the different population neurons. Simulations on other task such as sequential decision making serve to highlight the robustness of the proposed sch eme and, further, contrast its performance to that of temporal difference based approaches to reinforcement learning.
Resumo:
PURPOSE: To evaluate diffusion-weighted magnetic resonance (MR) imaging of the human placenta in fetuses with and fetuses without intrauterine growth restriction (IUGR) who were suspected of having placental insufficiency. MATERIALS AND METHODS: The study was approved by the local ethics committee, and written informed consent was obtained. The authors retrospectively evaluated 1.5-T fetal MR images from 102 singleton pregnancies (mean gestation ± standard deviation, 29 weeks ± 5; range, 21-41 weeks). Morphologic and diffusion-weighted MR imaging were performed. A region of interest analysis of the apparent diffusion coefficient (ADC) of the placenta was independently performed by two observers who were blinded to clinical data and outcome. Placental insufficiency was diagnosed if flattening of the growth curve was detected at obstetric ultrasonography (US), if the birth weight was in the 10th percentile or less, or if fetal weight estimated with US was below the 10th percentile. Abnormal findings at Doppler US of the umbilical artery and histopathologic examination of specimens from the placenta were recorded. The ADCs in fetuses with placental insufficiency were compared with those in fetuses of the same gestational age without placental insufficiency and tested for normal distribution. The t tests and Pearson correlation coefficients were used to compare these results at 5% levels of significance. RESULTS: Thirty-three of the 102 pregnancies were ultimately categorized as having an insufficient placenta. MR imaging depicted morphologic changes (eg, infarction or bleeding) in 27 fetuses. Placental dysfunction was suspected in 33 fetuses at diffusion-weighted imaging (mean ADC, 146.4 sec/mm(2) ± 10.63 for fetuses with placental insufficiency vs 177.1 sec/mm(2) ± 18.90 for fetuses without placental insufficiency; P < .01, with one false-positive case). The use of diffusion-weighted imaging in addition to US increased sensitivity for the detection of placental insufficiency from 73% to 100%, increased accuracy from 91% to 99%, and preserved specificity at 99%. CONCLUSION: Placental dysfunction associated with growth restriction is associated with restricted diffusion and reduced ADC. A decreased ADC used as an early marker of placental damage might be indicative of pregnancy complications such as IUGR.
Resumo:
To evaluate whether it is feasible to measure the segmental flux of small bowel content using MR phase-contrast (PC) pulse sequences.
Resumo:
The purpose of this study was to assess the efficacy and midterm results of endovascular treatment of acute, complicated type B aortic dissection.
Resumo:
This study evaluated long-term results of thoracic endovascular aortic repair for atherosclerotic aneurysms involving descending aorta.
Resumo:
Midterm results of TEVAR (thoracic endovascular aortic repair) in patients with aneurysms involving the descending aorta originating from chronic type B dissections are not known.