23 resultados para Nash equilibrium

em BORIS: Bern Open Repository and Information System - Berna - Suiça


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior, and in particular in behavioral decision making. Such decision making is likely to involve the integration of many synaptic events in space and time. However, using a single reinforcement signal to modulate synaptic plasticity, as suggested in classical reinforcement learning algorithms, a twofold problem arises. Different synapses will have contributed differently to the behavioral decision, and even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike-time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward, but also by a population feedback signal. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference (TD) based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task, the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second task involves an action sequence which is itself extended in time and reward is only delivered at the last action, as it is the case in any type of board-game. The third task is the inspection game that has been studied in neuroeconomics, where an inspector tries to prevent a worker from shirking. Applying our algorithm to this game yields a learning behavior which is consistent with behavioral data from humans and monkeys, revealing themselves properties of a mixed Nash equilibrium. The examples show that our neuronal implementation of reward based learning copes with delayed and stochastic reward delivery, and also with the learning of mixed strategies in two-opponent games.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior. But behavioral decision making is likely to involve the integration of many synaptic events in space and time. So in using a single reinforcement signal to modulate synaptic plasticity a twofold problem arises. Different synapses will have contributed differently to the behavioral decision and, even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward but by a population feedback signal as well. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second one involves an action sequence which is itself extended in time and reward is only delivered at the last action, as is the case in any type of board-game. The third is the inspection game that has been studied in neuroeconomics. It only has a mixed Nash equilibrium and exemplifies that the model also copes with stochastic reward delivery and the learning of mixed strategies.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Intestinal bacterial overgrowth and increased permeability are features of non alcoholic steatohepatitis (NASH). Bacterial endotoxin has been shown to promote NASH progression. Application of dextran sulfate sodium (DSS) is a colitis model in mice characterized by damage of the intestinal barrier. This study was designed to investigate if application of DSS aggravates experimental NASH.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Non-alcoholic steatohepatitis (NASH) has a prevalence of 1% in Western countries. Its causes as well as its medical treatment are, to date, still debated. Recently, studies of agents suggested to have antiapoptotic, insulin-sensitizing or anti-inflammatory effects in patients with NASH have been conducted, one of which is ursodeoxycholic acid (UDCA), a tertiary bile acid. Between 1994 and 2008, four prospective randomized, double-blind, placebo-controlled studies of the treatment of NASH with UDCA were conducted. The first study, by Lindor et al., compared the impact of 13-15 mg/kg/day of UDCA to a placebo. The second study by Dufour et al. had an additional third arm that administered combination therapy with UDCA and vitamin E. The third and fourth studies by Leuschner et al. and by Ratziu et al. evaluated high doses of UDCA at 25-35 mg/kg/day, and used liver biopsies and serum liver enzyme levels to evaluate the impact of UDCA. With the exception of Ratziu et al.'s study, which was lacking a second liver biopsy, none of these studies showed any significant differences in the treatment of NASH with UDCA compared with a placebo. However, Dufour et al. did observe a significant improvement of NASH with the combination (UDCA/VitE) vs placebo therapy, whereas UDCA monotherapy was not effective in the treatment of NASH. Nevertheless, the effects of other bile acids and combination therapies need to be explored.

Relevância:

20.00% 20.00%

Publicador: