17 resultados para action learning.

em BORIS: Bern Open Repository and Information System - Berna - Suiça


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike latency codes. The multi-valued and continuous-valued features in the postsynaptic code allow for a generalization of binary decision making to multi-valued decision making and continuous-valued action selection. We show that code-specific learning rules speed up learning both for the discrete classification and the continuous regression tasks. The suggested learning rules also speed up with increasing population size as opposed to standard reinforcement learning rules. Continuous action selection is further shown to explain realistic learning speeds in the Morris water maze. Finally, we introduce the concept of action perturbation as opposed to the classical weight- or node-perturbation as an exploration mechanism underlying reinforcement learning. Exploration in the action space greatly increases the speed of learning as compared to exploration in the neuron or weight space.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the last decade, the end-state comfort effect (e.g., Rosenbaum et al., 2006) has received a considerable amount of attention. However, some of the underlying mechanisms are still to be investigated, amongst others, how sequential planning affects end-state comfort and how this effect develops over learning. In a two-step sequencing task, e.g., postural comfort can be planned on the intermediate position (next state) or on the actual end position (final state). It might be hypothesized that, in initial acquisition, next state’s comfort is crucial for action planning but that, in the course of learning, final state’s comfort is taken more and more into account. To test this hypothesis, a variant of Rosenbaum’s vertical stick transportation task was used. Participants (N = 16, right-handed) received extensive practice on a two-step transportation task (10,000 trials over 12 sessions). From the initial position on the middle stair of a staircase in front of the participant, the stick had to be transported either 20 cm upwards and then 40 cm downwards or 20 cm downwards and then 40 cm upwards (N = 8 per subgroup). Participants were supposed to produce fluid movements without changing grasp. In the pre- and posttest, participants were tested on both two-step sequencing tasks as well as on 20 cm single-step upwards and downwards movements (10 trials per condition). For the test trials, grasp height was calculated kinematographically. In the pretest, large end/next/final-state comfort effects for single-step transportation tasks and large next-state comfort effects for sequenced tasks were found. However, no change in grasp height from pre- to posttest could be revealed. Results show that, in vertical stick transportation sequences, the final state is not taken into account when planning grasp height. Instead, action planning seems to be solely based on aspects of the next action goal that is to be reached.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior, and in particular in behavioral decision making. Such decision making is likely to involve the integration of many synaptic events in space and time. However, using a single reinforcement signal to modulate synaptic plasticity, as suggested in classical reinforcement learning algorithms, a twofold problem arises. Different synapses will have contributed differently to the behavioral decision, and even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike-time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward, but also by a population feedback signal. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference (TD) based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task, the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second task involves an action sequence which is itself extended in time and reward is only delivered at the last action, as it is the case in any type of board-game. The third task is the inspection game that has been studied in neuroeconomics, where an inspector tries to prevent a worker from shirking. Applying our algorithm to this game yields a learning behavior which is consistent with behavioral data from humans and monkeys, revealing themselves properties of a mixed Nash equilibrium. The examples show that our neuronal implementation of reward based learning copes with delayed and stochastic reward delivery, and also with the learning of mixed strategies in two-opponent games.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior. But behavioral decision making is likely to involve the integration of many synaptic events in space and time. So in using a single reinforcement signal to modulate synaptic plasticity a twofold problem arises. Different synapses will have contributed differently to the behavioral decision and, even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward but by a population feedback signal as well. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second one involves an action sequence which is itself extended in time and reward is only delivered at the last action, as is the case in any type of board-game. The third is the inspection game that has been studied in neuroeconomics. It only has a mixed Nash equilibrium and exemplifies that the model also copes with stochastic reward delivery and the learning of mixed strategies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The discovery of binary dendritic events such as local NMDA spikes in dendritic subbranches led to the suggestion that dendritic trees could be computationally equivalent to a 2-layer network of point neurons, with a single output unit represented by the soma, and input units represented by the dendritic branches. Although this interpretation endows a neuron with a high computational power, it is functionally not clear why nature would have preferred the dendritic solution with a single but complex neuron, as opposed to the network solution with many but simple units. We show that the dendritic solution has a distinguished advantage over the network solution when considering different learning tasks. Its key property is that the dendritic branches receive an immediate feedback from the somatic output spike, while in the corresponding network architecture the feedback would require additional backpropagating connections to the input units. Assuming a reinforcement learning scenario we formally derive a learning rule for the synaptic contacts on the individual dendritic trees which depends on the presynaptic activity, the local NMDA spikes, the somatic action potential, and a delayed reinforcement signal. We test the model for two scenarios: the learning of binary classifications and of precise spike timings. We show that the immediate feedback represented by the backpropagating action potential supplies the individual dendritic branches with enough information to efficiently adapt their synapses and to speed up the learning process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The discovery of binary dendritic events such as local NMDA spikes in dendritic subbranches led to the suggestion that dendritic trees could be computationally equivalent to a 2-layer network of point neurons, with a single output unit represented by the soma, and input units represented by the dendritic branches. Although this interpretation endows a neuron with a high computational power, it is functionally not clear why nature would have preferred the dendritic solution with a single but complex neuron, as opposed to the network solution with many but simple units. We show that the dendritic solution has a distinguished advantage over the network solution when considering different learning tasks. Its key property is that the dendritic branches receive an immediate feedback from the somatic output spike, while in the corresponding network architecture the feedback would require additional backpropagating connections to the input units. Assuming a reinforcement learning scenario we formally derive a learning rule for the synaptic contacts on the individual dendritic trees which depends on the presynaptic activity, the local NMDA spikes, the somatic action potential, and a delayed reinforcement signal. We test the model for two scenarios: the learning of binary classifications and of precise spike timings. We show that the immediate feedback represented by the backpropagating action potential supplies the individual dendritic branches with enough information to efficiently adapt their synapses and to speed up the learning process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We study synaptic plasticity in a complex neuronal cell model where NMDA-spikes can arise in certain dendritic zones. In the context of reinforcement learning, two kinds of plasticity rules are derived, zone reinforcement (ZR) and cell reinforcement (CR), which both optimize the expected reward by stochastic gradient ascent. For ZR, the synaptic plasticity response to the external reward signal is modulated exclusively by quantities which are local to the NMDA-spike initiation zone in which the synapse is situated. CR, in addition, uses nonlocal feedback from the soma of the cell, provided by mechanisms such as the backpropagating action potential. Simulation results show that, compared to ZR, the use of nonlocal feedback in CR can drastically enhance learning performance. We suggest that the availability of nonlocal feedback for learning is a key advantage of complex neurons over networks of simple point neurons, which have previously been found to be largely equivalent with regard to computational capability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sustainable natural resource use requires that multiple actors reassess their situation in a systemic perspective. This can be conceptualised as a social learning process between actors from rural communities and the experts from outside organisations. A specifically designed workshop oriented towards a systemic view of natural resource use and the enhancement of mutual learning between local and external actors, provided the background for evaluating the potentials and constraints of intensified social learning processes. Case studies in rural communities in India, Bolivia, Peru and Mali showed that changes in the narratives of the participants of the workshop followed a similar temporal sequence relatively independently from their specific contexts. Social learning processes were found to be more likely to be successful if they 1) opened new space for communicative action, allowing for an intersubjective re-definition of the present situation, 2) contributed to rebalance the relationships between social capital and social, emotional and cognitive competencies within and between local and external actors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article provides a selective overview of the functional neuroimaging literature with an emphasis on emotional activation processes. Emotions are fast and flexible response systems that provide basic tendencies for adaptive action. From the range of involved component functions, we first discuss selected automatic mechanisms that control basic adaptational changes. Second, we illustrate how neuroimaging work has contributed to the mapping of the network components associated with basic emotion families (fear, anger, disgust, happiness), and secondary dimensional concepts that organise the meaning space for subjective experience and verbal labels (emotional valence, activity/intensity, approach/withdrawal, etc.). Third, results and methodological difficulties are discussed in view of own neuroimaging experiments that investigated the component functions involved in emotional learning. The amygdala, prefrontal cortex, and striatum form a network of reciprocal connections that show topographically distinct patterns of activity as a correlate of up and down regulation processes during an emotional episode. Emotional modulations of other brain systems have attracted recent research interests. Emotional neuroimaging calls for more representative designs that highlight the modulatory influences of regulation strategies and socio-cultural factors responsible for inhibitory control and extinction. We conclude by emphasising the relevance of the temporal process dynamics of emotional activations that may provide improved prediction of individual differences in emotionality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a model of spike-driven synaptic plasticity inspired by experimental observations and motivated by the desire to build an electronic hardware device that can learn to classify complex stimuli in a semisupervised fashion. During training, patterns of activity are sequentially imposed on the input neurons, and an additional instructor signal drives the output neurons toward the desired activity. The network is made of integrate-and-fire neurons with constant leak and a floor. The synapses are bistable, and they are modified by the arrival of presynaptic spikes. The sign of the change is determined by both the depolarization and the state of a variable that integrates the postsynaptic action potentials. Following the training phase, the instructor signal is removed, and the output neurons are driven purely by the activity of the input neurons weighted by the plastic synapses. In the absence of stimulation, the synapses preserve their internal state indefinitely. Memories are also very robust to the disruptive action of spontaneous activity. A network of 2000 input neurons is shown to be able to classify correctly a large number (thousands) of highly overlapping patterns (300 classes of preprocessed Latex characters, 30 patterns per class, and a subset of the NIST characters data set) and to generalize with performances that are better than or comparable to those of artificial neural networks. Finally we show that the synaptic dynamics is compatible with many of the experimental observations on the induction of long-term modifications (spike-timing-dependent plasticity and its dependence on both the postsynaptic depolarization and the frequency of pre- and postsynaptic neurons).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present paper discusses a conceptual, methodological and practical framework within which the limitations of the conventional notion of natural resource management (NRM) can be overcome. NRM is understood as the application of scientific ecological knowledge to resource management. By including a consideration of the normative imperatives that arise from scientific ecological knowledge and submitting them to public scrutiny, ‘sustainable management of natural resources’ can be recontextualised as ‘sustainable governance of natural resources’. This in turn makes it possible to place the politically neutralising discourse of ‘management’ in a space for wider societal debate, in which the different actors involved can deliberate and negotiate the norms, rules and power relations related to natural resource use and sustainable development. The transformation of sustainable management into sustainable governance of natural resources can be conceptualised as a social learning process involving scientists, experts, politicians and local actors, and their corresponding scientific and non-scientific knowledges. The social learning process is the result of what Habermas has described as ‘communicative action’, in contrast to ‘strategic action’. Sustainable governance of natural resources thus requires a new space for communicative action aiming at shared, intersubjectively validated definitions of actual situations and the goals and means required for transforming current norms, rules and power relations in order to achieve sustainable development. Case studies from rural India, Bolivia and Mali explore the potentials and limitations for broadening communicative action through an intensification of social learning processes at the interface of local and external knowledge. Key factors that enable or hinder the transformation of sustainable management into sustainable governance of natural resources through social learning processes and communicative action are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present research is based on the notion that disengagement from goals is not a discrete event but a process (Klinger, 1975). A critical phase in this process is when difficulties and setbacks in striving for a goal accumulate. This critical phase is termed here as an action crisis. Given the profound effects that people's thoughts have on their self-regulatory efficiency, it is essential to understand the cognitive correlates of an action crisis. In two experimental lab and two correlational field studies, the hypothesis that goal-related costs and benefits become cognitively highly accessible during an action crisis was tested and supported. Participants who were experiencing an action crisis in such diverse goal areas as intimate relationships, sports, and university studies, thought about goal-related costs and benefits more intensively and frequently in comparison to participants who were not in an action crisis. In an incidental learning task they recognized more of cost–benefit-items and less of implementation-items than the control group. Results are interpreted in terms of action phase specific mindsets (Gollwitzer, 1990, 2012).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Competing water demands for household consumption as well as the production of food, energy, and other uses pose challenges for water supply and sustainable development in many parts of the world. Designing creative strategies and learning processes for sustainable water governance is thus of prime importance. While this need is uncontested, suitable approaches still have to be found. In this article we present and evaluate a conceptual approach to scenario building aimed at transdisciplinary learning for sustainable water governance. The approach combines normative, explorative, and participatory scenario elements. This combination allows for adequate consideration of stakeholders’ and scientists’ systems, target, and transformation knowledge. Application of the approach in the MontanAqua project in the Swiss Alps confirmed its high potential for co-producing new knowledge and establishing a meaningful and deliberative dialogue between all actors involved. The iterative and combined approach ensured that stakeholders’ knowledge was adequately captured, fed into scientific analysis, and brought back to stakeholders in several cycles, thereby facilitating learning and co-production of new knowledge relevant for both stakeholders and scientists. However, the approach also revealed a number of constraints, including the enormous flexibility required of stakeholders and scientists in order for them to truly engage in the co-production of new knowledge. Overall, the study showed that shifts from strategic to communicative action are possible in an environment of mutual trust. This ultimately depends on creating conditions of interaction that place scientists’ and stakeholders’ knowledge on an equal footing.