866 resultados para Decision-processes
Resumo:
In this work the state of the art of the automatic dialogue strategy management using Markov decision processes (MDP) with reinforcement learning (RL) is described. Partially observable Markov decision processes (POMDP) are also described. To test the validity of these methods, two spoken dialogue systems have been developed. The first one is a spoken dialogue system for weather forecast providing, and the second one is a more complex system for train information. With the first system, comparisons between a rule-based system and an automatically trained system have been done, using a real corpus to train the automatic strategy. In the second system, the scalability of these methods when used in larger systems has been tested.
Resumo:
Decisions about noisy stimuli require evidence integration over time. Traditionally, evidence integration and decision making are described as a one-stage process: a decision is made when evidence for the presence of a stimulus crosses a threshold. Here, we show that one-stage models cannot explain psychophysical experiments on feature fusion, where two visual stimuli are presented in rapid succession. Paradoxically, the second stimulus biases decisions more strongly than the first one, contrary to predictions of one-stage models and intuition. We present a two-stage model where sensory information is integrated and buffered before it is fed into a drift diffusion process. The model is tested in a series of psychophysical experiments and explains both accuracy and reaction time distributions. © 2012 Rüter et al.
Resumo:
This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a testbed simulated dialogue management problem, we show how recent optimization techniques are able to find a policy for this continuous POMDP which outperforms a traditional MDP approach. Further, we present a method for automatically improving handcrafted dialogue managers by incorporating POMDP belief state monitoring, including confidence score information. Experiments on the testbed system show significant improvements for several example handcrafted dialogue managers across a range of operating conditions.
Resumo:
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.
Resumo:
Externalizing behavior problems of 124 adolescents were assessed across Grades 7-11. In Grade 9, participants were also assessed across social-cognitive domains after imagining themselves as the object of provocations portrayed in six videotaped vignettes. Participants responded to vignette-based questions representing multiple processes of the response decision step of social information processing. Phase 1 of our investigation supported a two-factor model of the response evaluation process of response decision (response valuation and outcome expectancy). Phase 2 showed significant relations between the set of these response decision processes, as well as response selection, measured in Grade 9 and (a) externalizing behavior in Grade 9 and (b) externalizing behavior in Grades 10-11, even after controlling externalizing behavior in Grades 7-8. These findings suggest that on-line behavioral judgments about aggression play a crucial role in the maintenance and growth of aggressive response tendencies in adolescence.
Resumo:
Markov Decision Processes (MDPs) are extensively used to encode sequences of decisions with probabilistic effects. Markov Decision Processes with Imprecise Probabilities (MDPIPs) encode sequences of decisions whose effects are modeled using sets of probability distributions. In this paper we examine the computation of Γ-maximin policies for MDPIPs using multilinear and integer programming. We discuss the application of our algorithms to “factored” models and to a recent proposal, Markov Decision Processes with Set-valued Transitions (MDPSTs), that unifies the fields of probabilistic and “nondeterministic” planning in artificial intelligence research.
Resumo:
M. R. Banaji and A. G. Greenwald (1995) demonstrated a gender bias in fame judgments—that is, an increase in judged fame due to prior processing that was larger for male than for female names. They suggested that participants shift criteria between judging men and women, using the more liberal criterion for judging men. This "criterion-shift" account appeared problematic for a number of reasons. In this article, 3 experiments are reported that were designed to evaluate the criterion-shift account of the gender bias in the false-fame effect against a distribution-shift account. The results were consistent with the criterion-shift account, and they helped to define more precisely the situations in which people may be ready to shift their response criterion on an item-by-item basis. In addition, the results were incompatible with an interpretation of the criterion shift as an artifact of the experimental situation in the experiments reported by M. R. Banaji and A. G. Greenwald. (PsycINFO Database Record (c) 2010 APA, all rights reserved)
Resumo:
Before signing electronic contracts, a rational agent should estimate the expected utilities of these contracts and calculate the violation risks related to them. In order to perform such pre-signing procedures, this agent has to be capable of computing a policy taking into account the norms and sanctions in the contracts. In relation to this, the contribution of this work is threefold. First, we present the Normative Markov Decision Process, an extension of the Markov Decision Process for explicitly representing norms. In order to illustrate the usage of our framework, we model an example in a simulated aerospace aftermarket. Second, we specify an algorithm for identifying the states of the process which characterize the violation of norms. Finally, we show how to compute policies with our framework and how to calculate the risk of violating the norms in the contracts by adopting a particular policy.
Resumo:
This paper studies the average control problem of discrete-time Markov Decision Processes (MDPs for short) with general state space, Feller transition probabilities, and possibly non-compact control constraint sets A(x). Two hypotheses are considered: either the cost function c is strictly unbounded or the multifunctions A(r)(x) = {a is an element of A(x) : c(x, a) <= r} are upper-semicontinuous and compact-valued for each real r. For these two cases we provide new results for the existence of a solution to the average-cost optimality equality and inequality using the vanishing discount approach. We also study the convergence of the policy iteration approach under these conditions. It should be pointed out that we do not make any assumptions regarding the convergence and the continuity of the limit function generated by the sequence of relative difference of the alpha-discounted value functions and the Poisson equations as often encountered in the literature. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
The studies in the present thesis focus on post-decision processes using the theoretical framework of Differentiation and Consolidation Theory. This thesis consists of three studies. In all these studies, pre-decision evaluations are compared with post-decision evaluations in order to explore differences in evaluations of decision alternatives before and after a decision. The main aim of the studies was to describe and gain a clearer and better understanding of how people re-evaluate information, following a decision for which they have experienced the decision and outcome. The studies examine how the attractiveness evaluations of important attributes are restructured from the pre-decision to the post-decision phase; particularly restructuring processes of value conflicts. Value conflict attributes are those in which information speaks against the chosen alternative in a decision. The first study investigates an important real-life decision and illustrates different post-decision (consolidation) processes following the decision. The second study tests whether decisions with value conflicts follow the same consolidation (post-decision restructuring) processes when the conflict is controlled experimentally, as in earlier studies of less controlled real-life decisions. The third study investigates consolidation and value conflicts in decisions in which the consequences are controlled and of different magnitudes. The studies in the present thesis have shown how attractiveness restructuring of attributes in conflict occurs in the post-decision phase. Results from the three studies indicated that attractiveness restructuring of attributes in conflict was stronger for important real-life decisions (Study 1) and in situations in which real consequences followed a decision (Study 3) than in more controlled, hypothetical decision situations (Study 2). Finally, some proposals for future research are suggested, including studies of the effects of outcomes and consequences on consolidation of prior decisions and how a decision maker’s involvement affects his or her pre- and post-decision processes.