3 resultados para Adaptive game AI

em CaltechTHESIS


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis discusses various methods for learning and optimization in adaptive systems. Overall, it emphasizes the relationship between optimization, learning, and adaptive systems; and it illustrates the influence of underlying hardware upon the construction of efficient algorithms for learning and optimization. Chapter 1 provides a summary and an overview.

Chapter 2 discusses a method for using feed-forward neural networks to filter the noise out of noise-corrupted signals. The networks use back-propagation learning, but they use it in a way that qualifies as unsupervised learning. The networks adapt based only on the raw input data-there are no external teachers providing information on correct operation during training. The chapter contains an analysis of the learning and develops a simple expression that, based only on the geometry of the network, predicts performance.

Chapter 3 explains a simple model of the piriform cortex, an area in the brain involved in the processing of olfactory information. The model was used to explore the possible effect of acetylcholine on learning and on odor classification. According to the model, the piriform cortex can classify odors better when acetylcholine is present during learning but not present during recall. This is interesting since it suggests that learning and recall might be separate neurochemical modes (corresponding to whether or not acetylcholine is present). When acetylcholine is turned off at all times, even during learning, the model exhibits behavior somewhat similar to Alzheimer's disease, a disease associated with the degeneration of cells that distribute acetylcholine.

Chapters 4, 5, and 6 discuss algorithms appropriate for adaptive systems implemented entirely in analog hardware. The algorithms inject noise into the systems and correlate the noise with the outputs of the systems. This allows them to estimate gradients and to implement noisy versions of gradient descent, without having to calculate gradients explicitly. The methods require only noise generators, adders, multipliers, integrators, and differentiators; and the number of devices needed scales linearly with the number of adjustable parameters in the adaptive systems. With the exception of one global signal, the algorithms require only local information exchange.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the quest for a descriptive theory of decision-making, the rational actor model in economics imposes rather unrealistic expectations and abilities on human decision makers. The further we move from idealized scenarios, such as perfectly competitive markets, and ambitiously extend the reach of the theory to describe everyday decision making situations, the less sense these assumptions make. Behavioural economics has instead proposed models based on assumptions that are more psychologically realistic, with the aim of gaining more precision and descriptive power. Increased psychological realism, however, comes at the cost of a greater number of parameters and model complexity. Now there are a plethora of models, based on different assumptions, applicable in differing contextual settings, and selecting the right model to use tends to be an ad-hoc process. In this thesis, we develop optimal experimental design methods and evaluate different behavioral theories against evidence from lab and field experiments.

We look at evidence from controlled laboratory experiments. Subjects are presented with choices between monetary gambles or lotteries. Different decision-making theories evaluate the choices differently and would make distinct predictions about the subjects' choices. Theories whose predictions are inconsistent with the actual choices can be systematically eliminated. Behavioural theories can have multiple parameters requiring complex experimental designs with a very large number of possible choice tests. This imposes computational and economic constraints on using classical experimental design methods. We develop a methodology of adaptive tests: Bayesian Rapid Optimal Adaptive Designs (BROAD) that sequentially chooses the "most informative" test at each stage, and based on the response updates its posterior beliefs over the theories, which informs the next most informative test to run. BROAD utilizes the Equivalent Class Edge Cutting (EC2) criteria to select tests. We prove that the EC2 criteria is adaptively submodular, which allows us to prove theoretical guarantees against the Bayes-optimal testing sequence even in the presence of noisy responses. In simulated ground-truth experiments, we find that the EC2 criteria recovers the true hypotheses with significantly fewer tests than more widely used criteria such as Information Gain and Generalized Binary Search. We show, theoretically as well as experimentally, that surprisingly these popular criteria can perform poorly in the presence of noise, or subject errors. Furthermore, we use the adaptive submodular property of EC2 to implement an accelerated greedy version of BROAD which leads to orders of magnitude speedup over other methods.

We use BROAD to perform two experiments. First, we compare the main classes of theories for decision-making under risk, namely: expected value, prospect theory, constant relative risk aversion (CRRA) and moments models. Subjects are given an initial endowment, and sequentially presented choices between two lotteries, with the possibility of losses. The lotteries are selected using BROAD, and 57 subjects from Caltech and UCLA are incentivized by randomly realizing one of the lotteries chosen. Aggregate posterior probabilities over the theories show limited evidence in favour of CRRA and moments' models. Classifying the subjects into types showed that most subjects are described by prospect theory, followed by expected value. Adaptive experimental design raises the possibility that subjects could engage in strategic manipulation, i.e. subjects could mask their true preferences and choose differently in order to obtain more favourable tests in later rounds thereby increasing their payoffs. We pay close attention to this problem; strategic manipulation is ruled out since it is infeasible in practice, and also since we do not find any signatures of it in our data.

In the second experiment, we compare the main theories of time preference: exponential discounting, hyperbolic discounting, "present bias" models: quasi-hyperbolic (α, β) discounting and fixed cost discounting, and generalized-hyperbolic discounting. 40 subjects from UCLA were given choices between 2 options: a smaller but more immediate payoff versus a larger but later payoff. We found very limited evidence for present bias models and hyperbolic discounting, and most subjects were classified as generalized hyperbolic discounting types, followed by exponential discounting.

In these models the passage of time is linear. We instead consider a psychological model where the perception of time is subjective. We prove that when the biological (subjective) time is positively dependent, it gives rise to hyperbolic discounting and temporal choice inconsistency.

We also test the predictions of behavioral theories in the "wild". We pay attention to prospect theory, which emerged as the dominant theory in our lab experiments of risky choice. Loss aversion and reference dependence predicts that consumers will behave in a uniquely distinct way than the standard rational model predicts. Specifically, loss aversion predicts that when an item is being offered at a discount, the demand for it will be greater than that explained by its price elasticity. Even more importantly, when the item is no longer discounted, demand for its close substitute would increase excessively. We tested this prediction using a discrete choice model with loss-averse utility function on data from a large eCommerce retailer. Not only did we identify loss aversion, but we also found that the effect decreased with consumers' experience. We outline the policy implications that consumer loss aversion entails, and strategies for competitive pricing.

In future work, BROAD can be widely applicable for testing different behavioural models, e.g. in social preference and game theory, and in different contextual settings. Additional measurements beyond choice data, including biological measurements such as skin conductance, can be used to more rapidly eliminate hypothesis and speed up model comparison. Discrete choice models also provide a framework for testing behavioural models with field data, and encourage combined lab-field experiments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

How animals use sensory information to weigh the risks vs. benefits of behavioral decisions remains poorly understood. Inter-male aggression is triggered when animals perceive both the presence of an appetitive resource, such as food or females, and of competing conspecific males. How such signals are detected and integrated to control the decision to fight is not clear. Here we use the vinegar fly, Drosophila melanogaster, to investigate the manner in which food and females promotes aggression.

In the first chapter, we explore how food controls aggression. As in many other species, food promotes aggression in flies, but it is not clear whether food increases aggression per se, or whether aggression is a secondary consequence of increased social interactions caused by aggregation of flies on food. Furthermore, nothing is known about how animals evaluate the quality and quantity of food in the context of competition. We show that food promotes aggression independently of any effect to increase the frequency of contact between males. Food increases aggression but not courtship between males, suggesting that the effect of food on aggression is specific. Next, we show that flies tune the level of aggression according to absolute amount of food rather than other parameters, such as area or concentration of food. Sucrose, a sugar molecule present in many fruits, is sufficient to promote aggression, and detection of sugar via gustatory receptor neurons is necessary for food-promoted aggression. Furthermore, we show that while food is necessary for aggression, too much food decreases aggression. Finally, we show that flies exhibit strategies consistent with a territorial strategy. These data suggest that flies use sweet-sensing gustatory information to guide their decision to fight over a limited quantity of a food resource.

Following up on the findings of the first chapter, we asked how the presence of a conspecific female resource promotes male-male aggression. In the absence of food, group-housed male flies, who normally do not fight even in the presence of food, fight in the presence of females. Unlike food, the presence of females strongly influences proximity between flies. Nevertheless, as group-housed flies do not fight even when they are in small chambers, it is unlikely that the presence of female indirectly increases aggression by first increasing proximity. Unlike food, the presence of females also leads to large increases in locomotion and in male-female courtship behaviors, suggesting that females may influence aggression as well as general arousal. Female cuticular hydrocarbons are required for this effect, as females that do not produce CH pheromones are unable to promote male-male aggression. In particular, 7,11-HD––a female-specific cuticular hydrocarbon pheromone critical for male-female courtship––is sufficient to mediate this effect when it is perfumed onto pheromone-deficient females or males. Recent studies showed that ppk23+ GRNs label two population of GRNs, one of which detects male cuticular hydrocarbons and another labeled by ppk23 and ppk25, which detects female cuticular hydrocarbons. I show that in particular, both of these GRNs control aggression, presumably via detection of female or male pheromones. To further investigate the ways in which these two classes of GRNs control aggression, I developed new genetic tools to independently test the male- and female-sensing GRNs. I show that ppk25-LexA and ppk25-GAL80 faithfully recapitulate the expression pattern of ppk25-GAL4 and label a subset of ppk23+ GRNs. These tools can be used in future studies to dissect the respective functions of male-sensing and female-sensing GRNs in male social behaviors.

Finally, in the last chapter, I discuss quantitative approaches to describe how varying quantities of food and females could control the level of aggression. Flies show an inverse-U shaped aggressive response to varying quantities of food and a flat aggressive response to varying quantities of females. I show how two simple game theoretic models, “prisoner’s dilemma” and “coordination game” could be used to describe the level of aggression we observe. These results suggest that flies may use strategic decision-making, using simple comparisons of costs and benefits.

In conclusion, male-male aggression in Drosophila is controlled by simple gustatory cues from food and females, which are detected by gustatory receptor neurons. Different quantities of resource cues lead to different levels of aggression, and flies show putative territorial behavior, suggesting that fly aggression is a highly strategic adaptive behavior. How these resource cues are integrated with male pheromone cues and give rise to this complex behavior is an interesting subject, which should keep researchers busy in the coming years.