848 resultados para Kalai, Ehud. Rational learning lead to Nash equilibrium


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In wireless ad hoc networks, nodes communicate with far off destinations using intermediate nodes as relays. Since wireless nodes are energy constrained, it may not be in the best interest of a node to always accept relay requests. On the other hand, if all nodes decide not to expend energy in relaying, then network throughput will drop dramatically. Both these extreme scenarios (complete cooperation and complete noncooperation) are inimical to the interests of a user. In this paper, we address the issue of user cooperation in ad hoc networks. We assume that nodes are rational, i.e., their actions are strictly determined by self interest, and that each node is associated with a minimum lifetime constraint. Given these lifetime constraints and the assumption of rational behavior, we are able to determine the optimal share of service that each node should receive. We define this to be the rational Pareto optimal operating point. We then propose a distributed and scalable acceptance algorithm called Generous TIT-FOR-TAT (GTFT). The acceptance algorithm is used by the nodes to decide whether to accept or reject a relay request. We show that GTFT results in a Nash equilibrium and prove that the system converges to the rational and optimal operating point.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Channel assignment in multi-channel multi-radio wireless networks poses a significant challenge due to scarcity of number of channels available in the wireless spectrum. Further, additional care has to be taken to consider the interference characteristics of the nodes in the network especially when nodes are in different collision domains. This work views the problem of channel assignment in multi-channel multi-radio networks with multiple collision domains as a non-cooperative game where the objective of the players is to maximize their individual utility by minimizing its interference. Necessary and sufficient conditions are derived for the channel assignment to be a Nash Equilibrium (NE) and efficiency of the NE is analyzed by deriving the lower bound of the price of anarchy of this game. A new fairness measure in multiple collision domain context is proposed and necessary and sufficient conditions for NE outcomes to be fair are derived. The equilibrium conditions are then applied to solve the channel assignment problem by proposing three algorithms, based on perfect/imperfect information, which rely on explicit communication between the players for arriving at an NE. A no-regret learning algorithm known as Freund and Schapire Informed algorithm, which has an additional advantage of low overhead in terms of information exchange, is proposed and its convergence to the stabilizing outcomes is studied. New performance metrics are proposed and extensive simulations are done using Matlab to obtain a thorough understanding of the performance of these algorithms on various topologies with respect to these metrics. It was observed that the algorithms proposed were able to achieve good convergence to NE resulting in efficient channel assignment strategies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a computational grid, the presence of grid resource providers who are rational and intelligent could lead to an overall degradation in the efficiency of the grid. In this paper, we design incentive compatible grid resource procurement mechanisms which ensure that the efficiency of the grid is not affected by the rational behavior of resource providers.In particular, we offer three elegant incentive compatible mechanisms for this purpose: (1) G-DSIC (Grid-Dominant Strategy Incentive Compatible) mechanism (2) G-BIC (Grid-Bayesian Nash Incentive Compatible) mechanism (3) G-OPT(Grid-Optimal) mechanism which minimizes the cost to the grid user, satisfying at the same time, (a) Bayesian incentive compatibility and (b) individual rationality. We evaluate the relative merits and demerits of the above three mechanisms using game theoretical analysis and numerical experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a growing amount of experimental evidence that suggests people often deviate from the predictions of game theory. Some scholars attempt to explain the observations by introducing errors into behavioral models. However, most of these modifications are situation dependent and do not generalize. A new theory, called the rational novice model, is introduced as an attempt to provide a general theory that takes account of erroneous behavior. The rational novice model is based on two central principals. The first is that people systematically make inaccurate guesses when they are evaluating their options in a game-like situation. The second is that people treat their decisions similar to a portfolio problem. As a result, non optimal actions in a game theoretic sense may be included in the rational novice strategy profile with positive weights.

The rational novice model can be divided into two parts: the behavioral model and the equilibrium concept. In a theoretical chapter, the mathematics of the behavioral model and the equilibrium concept are introduced. The existence of the equilibrium is established. In addition, the Nash equilibrium is shown to be a special case of the rational novice equilibrium. In another chapter, the rational novice model is applied to a voluntary contribution game. Numerical methods were used to obtain the solution. The model is estimated with data obtained from the Palfrey and Prisbrey experimental study of the voluntary contribution game. It is found that the rational novice model explains the data better than the Nash model. Although a formal statistical test was not used, pseudo R^2 analysis indicates that the rational novice model is better than a Probit model similar to the one used in the Palfrey and Prisbrey study.

The rational novice model is also applied to a first price sealed bid auction. Again, computing techniques were used to obtain a numerical solution. The data obtained from the Chen and Plott study were used to estimate the model. The rational novice model outperforms the CRRAM, the primary Nash model studied in the Chen and Plott study. However, the rational novice model is not the best amongst all models. A sophisticated rule-of-thumb, called the SOPAM, offers the best explanation of the data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce Collocation Games as the basis of a general framework for modeling, analyzing, and facilitating the interactions between the various stakeholders in distributed systems in general, and in cloud computing environments in particular. Cloud computing enables fixed-capacity (processing, communication, and storage) resources to be offered by infrastructure providers as commodities for sale at a fixed cost in an open marketplace to independent, rational parties (players) interested in setting up their own applications over the Internet. Virtualization technologies enable the partitioning of such fixed-capacity resources so as to allow each player to dynamically acquire appropriate fractions of the resources for unencumbered use. In such a paradigm, the resource management problem reduces to that of partitioning the entire set of applications (players) into subsets, each of which is assigned to fixed-capacity cloud resources. If the infrastructure and the various applications are under a single administrative domain, this partitioning reduces to an optimization problem whose objective is to minimize the overall deployment cost. In a marketplace, in which the infrastructure provider is interested in maximizing its own profit, and in which each player is interested in minimizing its own cost, it should be evident that a global optimization is precisely the wrong framework. Rather, in this paper we use a game-theoretic framework in which the assignment of players to fixed-capacity resources is the outcome of a strategic "Collocation Game". Although we show that determining the existence of an equilibrium for collocation games in general is NP-hard, we present a number of simplified, practically-motivated variants of the collocation game for which we establish convergence to a Nash Equilibrium, and for which we derive convergence and price of anarchy bounds. In addition to these analytical results, we present an experimental evaluation of implementations of some of these variants for cloud infrastructures consisting of a collection of multidimensional resources of homogeneous or heterogeneous capacities. Experimental results using trace-driven simulations and synthetically generated datasets corroborate our analytical results and also illustrate how collocation games offer a feasible distributed resource management alternative for autonomic/self-organizing systems, in which the adoption of a global optimization approach (centralized or distributed) would be neither practical nor justifiable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. At the same time, finding nearest neighbors accurately and efficiently can be challenging, especially when the database contains a large number of objects, and when the underlying distance measure is computationally expensive. This thesis proposes new methods for improving the efficiency and accuracy of nearest neighbor retrieval and classification in spaces with computationally expensive distance measures. The proposed methods are domain-independent, and can be applied in arbitrary spaces, including non-Euclidean and non-metric spaces. In this thesis particular emphasis is given to computer vision applications related to object and shape recognition, where expensive non-Euclidean distance measures are often needed to achieve high accuracy. The first contribution of this thesis is the BoostMap algorithm for embedding arbitrary spaces into a vector space with a computationally efficient distance measure. Using this approach, an approximate set of nearest neighbors can be retrieved efficiently - often orders of magnitude faster than retrieval using the exact distance measure in the original space. The BoostMap algorithm has two key distinguishing features with respect to existing embedding methods. First, embedding construction explicitly maximizes the amount of nearest neighbor information preserved by the embedding. Second, embedding construction is treated as a machine learning problem, in contrast to existing methods that are based on geometric considerations. The second contribution is a method for constructing query-sensitive distance measures for the purposes of nearest neighbor retrieval and classification. In high-dimensional spaces, query-sensitive distance measures allow for automatic selection of the dimensions that are the most informative for each specific query object. It is shown theoretically and experimentally that query-sensitivity increases the modeling power of embeddings, allowing embeddings to capture a larger amount of the nearest neighbor structure of the original space. The third contribution is a method for speeding up nearest neighbor classification by combining multiple embedding-based nearest neighbor classifiers in a cascade. In a cascade, computationally efficient classifiers are used to quickly classify easy cases, and classifiers that are more computationally expensive and also more accurate are only applied to objects that are harder to classify. An interesting property of the proposed cascade method is that, under certain conditions, classification time actually decreases as the size of the database increases, a behavior that is in stark contrast to the behavior of typical nearest neighbor classification systems. The proposed methods are evaluated experimentally in several different applications: hand shape recognition, off-line character recognition, online character recognition, and efficient retrieval of time series. In all datasets, the proposed methods lead to significant improvements in accuracy and efficiency compared to existing state-of-the-art methods. In some datasets, the general-purpose methods introduced in this thesis even outperform domain-specific methods that have been custom-designed for such datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Boolean games, agents try to reach a goal formulated as a Boolean formula. These games are attractive because of their compact representations. However, few methods are available to compute the solutions and they are either limited or do not take privacy or communication concerns into account. In this paper we propose the use of an algorithm related to reinforcement learning to address this problem. Our method is decentralized in the sense that agents try to achieve their goals without knowledge of the other agents’ goals. We prove that this is a sound method to compute a Pareto optimal pure Nash equilibrium for an interesting class of Boolean games. Experimental results are used to investigate the performance of the algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research work aims to study the use of peanut hulls, an agricultural and food industry waste, for copper and lead removal through equilibrium and kinetic parameters evaluation. Equilibrium batch studies were performed in a batch adsorber. The influence of initial pH was evaluated (3–5) and it was selected between 4.0 and 4.5. The maximum sorption capacities obtained for the Langmuir model were 0.21 ± 0.03 and 0.18 ± 0.02 mmol/g, respectively for copper and lead. In bi-component systems, competitive sorption of copper and lead was verified, the total amount adsorbed being around 0.21 mmol of metal per gram of material in both mono and bi-component systems. In the kinetic studies equilibrium was reached after 200 min contact time using a 400 rpm stirring rate, achieving 78% and 58% removal, in mono-component system, for copper and lead respectively. Their removal follows a pseudo-second-order kinetics. These studies show that most of the metals removal occurred in the first 20 min of contact, which shows a good uptake rate in all systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Resumen basado en el de la publicación

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper shows that a competitive equilibrium model, where a representative agent maximizes welfare, expectations are rational and markets are in equilibrium can account for several hyperinflation stylized facts. The theory is built by combining two hypotheses, namely, a fiscal crisis that requires printing money to finance an increasing public deficit and a predicted change in an unsustainable fiscal regime.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior, and in particular in behavioral decision making. Such decision making is likely to involve the integration of many synaptic events in space and time. However, using a single reinforcement signal to modulate synaptic plasticity, as suggested in classical reinforcement learning algorithms, a twofold problem arises. Different synapses will have contributed differently to the behavioral decision, and even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike-time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward, but also by a population feedback signal. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference (TD) based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task, the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second task involves an action sequence which is itself extended in time and reward is only delivered at the last action, as it is the case in any type of board-game. The third task is the inspection game that has been studied in neuroeconomics, where an inspector tries to prevent a worker from shirking. Applying our algorithm to this game yields a learning behavior which is consistent with behavioral data from humans and monkeys, revealing themselves properties of a mixed Nash equilibrium. The examples show that our neuronal implementation of reward based learning copes with delayed and stochastic reward delivery, and also with the learning of mixed strategies in two-opponent games.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning by reinforcement is important in shaping animal behavior. But behavioral decision making is likely to involve the integration of many synaptic events in space and time. So in using a single reinforcement signal to modulate synaptic plasticity a twofold problem arises. Different synapses will have contributed differently to the behavioral decision and, even for one and the same synapse, releases at different times may have had different effects. Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward but by a population feedback signal as well. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online. Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events. The performance of the model is assessed by studying three non-Markovian tasks. In the first task the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second one involves an action sequence which is itself extended in time and reward is only delivered at the last action, as is the case in any type of board-game. The third is the inspection game that has been studied in neuroeconomics. It only has a mixed Nash equilibrium and exemplifies that the model also copes with stochastic reward delivery and the learning of mixed strategies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While many studies confirm that positive emotions, including enjoyment, lead to better student achievement, less empirical evidence exists about possible mediator variables that link achievement to enjoyment. It is proposed that achievement and enjoyment form a circular dependency; enjoyment in learning leads to higher achievement but a degree of achievement is required to enjoy learning. This study provides insight into the reverse of the much studied enjoyment to achievement link and provides practical recommendations on how to use these findings. Founded in Control-value theory, which suggests that control and value cognitions are important variables that mediate the connection between enjoyment and achievement, this study explores the reciprocal achievement-cognition-enjoyment link. The reciprocal link was investigated by applying a one year longitudinal design to students of grade 6 and 7 (N = 356). This age group was chosen because early adolescence represents a critical period during which a strong decrease in positive learning emotions is observed. Part of the work involved identifying factors that might be responsible for this negative development. Results of cross-lagged path analysis identified reciprocal effects between student achievement and enjoyment with control and value cognitions functioning as partial mediators. High achievement goes with high control and value cognitions, which in turn positively affect enjoyment. However, cross-lagged correlations could only be partly confirmed. The results are discussed in terms of theoretical and practical implications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While many studies confirm that positive emotions, including enjoyment, lead to better student achievement, less empirical evidence exists about possible mediator variables that link achievement to enjoyment. It is proposed that achievement and enjoyment form a circular dependency; enjoyment in learning leads to higher achievement but a degree of achievement is required to enjoy learning. This study provides insight into the reverse of the much studied enjoyment to achievement link and provides practical recommendations on how to use these findings. Founded in Control-value theory, which suggests that control and value cognitions are important variables that mediate the connection between enjoyment and achievement, this study explores the reciprocal achievement-cognition-enjoyment link. The reciprocal link was investigated by applying a one year longitudinal design to students of grade 6 and 7 (N = 356). This age group was chosen because early adolescence represents a critical period during which a strong decrease in positive learning emotions is observed. Part of the work involved identifying factors that might be responsible for this negative development. Results of cross-lagged path analysis identified reciprocal effects between student achievement and enjoyment with control and value cognitions functioning as partial mediators. High achievement goes with high control and value cognitions, which in turn positively affect enjoyment. However, cross-lagged correlations could only be partly confirmed. The results are discussed in terms of theoretical and practical implications