848 resultados para Kalai, Ehud. Rational learning lead to Nash equilibrium
Resumo:
Kalai and Lebrer (93a, b) have recently show that for the case of infinitely repeated games, a coordination assumption on beliefs and optimal strategies ensures convergence to Nash equilibrium. In this paper, we show that for the case of repeated games with long (but finite) horizon, their condition does not imply approximate Nash equilibrium play. Recently Kalai and Lehrer (93a, b) proved that a coordination assumption on beliefs and optimal strategies, ensures that pIayers of an infinitely repeated game eventually pIay 'E-close" to an E-Nash equilibrium. Their coordination assumption requires that if players believes that certain set of outcomes have positive probability then it must be the case that this set of outcomes have, in fact, positive probability. This coordination assumption is called absolute continuity. For the case of finitely repeated games, the absolute continuity assumption is a quite innocuous assumption that just ensures that pIayers' can revise their priors by Bayes' Law. However, for the case of infinitely repeated games, the absolute continuity assumption is a stronger requirement because it also refers to events that can never be observed in finite time.
Resumo:
We study opinion dynamics in a population of interacting adaptive agents voting on a set of issues represented by vectors. We consider agents who can classify issues into one of two categories and can arrive at their opinions using an adaptive algorithm. Adaptation comes from learning and the information for the learning process comes from interacting with other neighboring agents and trying to change the internal state in order to concur with their opinions. The change in the internal state is driven by the information contained in the issue and in the opinion of the other agent. We present results in a simple yet rich context where each agent uses a Boolean perceptron to state their opinion. If the update occurs with information asynchronously exchanged among pairs of agents, then the typical case, if the number of issues is kept small, is the evolution into a society torn by the emergence of factions with extreme opposite beliefs. This occurs even when seeking consensus with agents with opposite opinions. If the number of issues is large, the dynamics becomes trapped, the society does not evolve into factions and a distribution of moderate opinions is observed. The synchronous case is technically simpler and is studied by formulating the problem in terms of differential equations that describe the evolution of order parameters that measure the consensus between pairs of agents. We show that for a large number of issues and unidirectional information flow, global consensus is a fixed point; however, the approach to this consensus is glassy for large societies.
Resumo:
Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.
Resumo:
Background - The literature is not univocal about the effects of Peer Review (PR) within the context of constructivist learning. Due to the predominant focus on using PR as an assessment tool, rather than a constructivist learning activity, and because most studies implicitly assume that the benefits of PR are limited to the reviewee, little is known about the effects upon students who are required to review their peers. Much of the theoretical debate in the literature is focused on explaining how and why constructivist learning is beneficial. At the same time these discussions are marked by an underlying presupposition of a causal relationship between reviewing and deep learning. Objectives - The purpose of the study is to investigate whether the writing of PR feedback causes students to benefit in terms of: perceived utility about statistics, actual use of statistics, better understanding of statistical concepts and associated methods, changed attitudes towards market risks, and outcomes of decisions that were made. Methods - We conducted a randomized experiment, assigning students randomly to receive PR or non–PR treatments and used two cohorts with a different time span. The paper discusses the experimental design and all the software components that we used to support the learning process: Reproducible Computing technology which allows students to reproduce or re–use statistical results from peers, Collaborative PR, and an AI–enhanced Stock Market Engine. Results - The results establish that the writing of PR feedback messages causes students to experience benefits in terms of Behavior, Non–Rote Learning, and Attitudes, provided the sequence of PR activities are maintained for a period that is sufficiently long.
Resumo:
Mobile applications are being increasingly deployed on a massive scale in various mobile sensor grid database systems. With limited resources from the mobile devices, how to process the huge number of queries from mobile users with distributed sensor grid databases becomes a critical problem for such mobile systems. While the fundamental semantic cache technique has been investigated for query optimization in sensor grid database systems, the problem is still difficult due to the fact that more realistic multi-dimensional constraints have not been considered in existing methods. To solve the problem, a new semantic cache scheme is presented in this paper for location-dependent data queries in distributed sensor grid database systems. It considers multi-dimensional constraints or factors in a unified cost model architecture, determines the parameters of the cost model in the scheme by using the concept of Nash equilibrium from game theory, and makes semantic cache decisions from the established cost model. The scenarios of three factors of semantic, time and locations are investigated as special cases, which improve existing methods. Experiments are conducted to demonstrate the semantic cache scheme presented in this paper for distributed sensor grid database systems.
Resumo:
We present a model for the self-organized formation of place cells, head-direction cells, and spatial-view cells in the hippocampal formation based on unsupervised learning on quasi-natural visual stimuli. The model comprises a hierarchy of Slow Feature Analysis (SFA) nodes, which were recently shown to reproduce many properties of complex cells in the early visual system []. The system extracts a distributed grid-like representation of position and orientation, which is transcoded into a localized place-field, head-direction, or view representation, by sparse coding. The type of cells that develops depends solely on the relevant input statistics, i.e., the movement pattern of the simulated animal. The numerical simulations are complemented by a mathematical analysis that allows us to accurately predict the output of the top SFA layer.
Resumo:
We define Nash equilibrium for two-person normal form games in the presence of uncertainty, in the sense of Knight(1921). We use the fonna1iution of uncertainty due to Schmeidler and Gilboa. We show tbat there exist Nash equilibria for any degree of uncertainty, as measured by the uncertainty aversion (Dow anel Wer1ang(l992a». We show by example tbat prudent behaviour (maxmin) can be obtained as an outcome even when it is not rationaliuble in the usual sense. Next, we break down backward industion in the twice repeated prisoner's dilemma. We link these results with those on cooperation in the finitely repeated prisoner's dilemma obtained by Kreps-Milgrom-Roberts-Wdson(1982), and withthe 1iterature on epistemological conditions underlying Nash equilibrium. The knowledge notion implicit in this mode1 of equilibrium does not display logical omniscience.
Resumo:
We present two alternative definitions of Nash equilibrium for two person games in the presence af uncertainty, in the sense of Knight. We use the formalization of uncertainty due to Schmeidler and Gilboa. We show that, with one of the definitions, prudent behaviour (maxmin) can be obtained as an outcome even when it is not rationalizable in the usual sense. Most striking is that with the Same definition we break down backward induction in the twice repeated prisoner's dilemma. We also link these results with the Kreps-Milgrom-Roberts-Wilson explanation of cooperation in the finitely repeated prisoner's dilemma.
Resumo:
We define a subgame perfect Nash equilibrium under Knightian uncertainty for two players, by means of a recursive backward induction procedure. We prove an extension of the Zermelo-von Neumann-Kuhn Theorem for games of perfect information, i. e., that the recursive procedure generates a Nash equilibrium under uncertainty (Dow and Werlang(1994)) of the whole game. We apply the notion for two well known games: the chain store and the centipede. On the one hand, we show that subgame perfection under Knightian uncertainty explains the chain store paradox in a one shot version. On the other hand, we show that subgame perfection under uncertainty does not account for the leaving behavior observed in the centipede game. This is in contrast to Dow, Orioli and Werlang(1996) where we explain by means of Nash equilibria under uncertainty (but not subgame perfect) the experiments of McKelvey and Palfrey(1992). Finally, we show that there may be nontrivial subgame perfect equilibria under uncertainty in more complex extensive form games, as in the case of the finitely repeated prisoner's dilemma, which accounts for cooperation in early stages of the game.
Resumo:
We define a subgame perfect Nash equilibrium under Knightian uncertainty for two players, by means of a recursive backward induction procedure. We prove an extension of the Zermelo-von Neumann-Kuhn Theorem for games of perfect information, i. e., that the recursive procedure generates a Nash equilibrium under uncertainty (Dow and Werlang(1994)) of the whole game. We apply the notion for two well known games: the chain store and the centipede. On the one hand, we show that subgame perfection under Knightian uncertainty explains the chain store paradox in a one shot version. On the other hand, we show that subgame perfection under uncertainty does not account for the leaving behavior observed in the centipede game. This is in contrast to Dow, Orioli and Werlang(1996) where we explain by means of Nash equilibria under uncertainty (but not subgame perfect) the experiments of McKelvey and Palfrey(1992). Finally, we show that there may be nontrivial subgame perfect equilibria under uncertainty in more complex extensive form games, as in the case of the finitely repeated prisoner's dilemma, which accounts for cooperation in early stages of the game .
Resumo:
We show that for a large class of competitive nonlinear pricing games with adverse selection, the property of better-reply security is naturally satisfied - thus, resolving via a result due to Reny (1999) the issue of existence of Nash equilibrium for a large class of competitive nonlinear pricing games.
Resumo:
Management and organization literature has extensively noticed the crucial role that improvisation assumes in organizations, both as a learning process (Miner, Bassoff & Moorman, 2001), a creative process (Fisher & Amabile, 2008), a capability (Vera & Crossan, 2005), and a personal disposition (Hmielesky & Corbett, 2006; 2008). My dissertation aims to contribute to the existing literature on improvisation, addressing two general research questions: 1) How does improvisation unfold at an individual level? 2) What are the potential antecedents and consequences of individual proclivity to improvise? This dissertation is based on a mixed methodology that allowed me to deal with these two general research questions and enabled a constant interaction between the theoretical framework and the empirical results. The selected empirical field is haute cuisine and the respondents are the executive chefs of the restaurants awarded by Michelin Guide in 2010 in Italy. The qualitative section of the dissertation is based on the analysis of 26 inductive case studies and offers a multifaceted contribution. First, I describe how improvisation works both as a learning and creative process. Second, I introduce a new categorization of individual improvisational scenarios (demanded creative improvisation, problem solving improvisation, and pure creative improvisation). Third, I describe the differences between improvisation and other creative processes detected in the field (experimentation, brainstorming, trial and error through analytical procedure, trial and error, and imagination). The quantitative inquiry is founded on a Structural Equation Model, which allowed me to test simultaneously the relationships between proclivity to improvise and its antecedents and consequences. In particular, using a newly developed scale to measure individual proclivity to improvise, I test the positive influence of industry experience, self-efficacy, and age on proclivity to improvise and the negative impact of proclivity to improvise on outcome deviation. Theoretical contributions and practical implications of the results are discussed.
Resumo:
A within-subject design was used to test whether repeatedly drinking a novel-flavoured and coloured drink while thirsty would influence subsequent liking for or consumption of that drink, compared to a different flavoured and coloured drink repeatedly consumed while less thirsty. Each participant was given 300 ml of one flavoured drink (H) after consuming a high salt meal (5.27 g of salt), and 300 ml of another flavoured drink (L) after consuming a low salt meal (1.27 g of salt). Participants had 4 sessions with each meal-type/drink combination, in an intermixed order. Pre- and post-training assessments of the drinks were conducted to determine the impact of the training regime on pleasantness and perceived thirst-quenching effect of the drinks. The final session included a choice test, and ad libitum access to the chosen drink, after either a high or low salt meal. In this final choice session, people drank almost twice as much H as L; however, there were no differential effects of past training on rated liking or choice. The increased consumption of H might reflect greater liking for H which was not detected by the rating scales; or it might reflect the learning of greater "conditioned thirst" in response to the flavour of H. © 2002 Elsevier Science Ltd. All rights reserved.
Resumo:
This article is searching for necessary and sufficient conditions which are to be imposed on the demand curve to guarantee the existence of pure strategy Nash equilibrium in a Bertrand-Edgeworth game with capacity constraints.