942 resultados para Markov decision processes


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Before signing electronic contracts, a rational agent should estimate the expected utilities of these contracts and calculate the violation risks related to them. In order to perform such pre-signing procedures, this agent has to be capable of computing a policy taking into account the norms and sanctions in the contracts. In relation to this, the contribution of this work is threefold. First, we present the Normative Markov Decision Process, an extension of the Markov Decision Process for explicitly representing norms. In order to illustrate the usage of our framework, we model an example in a simulated aerospace aftermarket. Second, we specify an algorithm for identifying the states of the process which characterize the violation of norms. Finally, we show how to compute policies with our framework and how to calculate the risk of violating the norms in the contracts by adopting a particular policy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies the average control problem of discrete-time Markov Decision Processes (MDPs for short) with general state space, Feller transition probabilities, and possibly non-compact control constraint sets A(x). Two hypotheses are considered: either the cost function c is strictly unbounded or the multifunctions A(r)(x) = {a is an element of A(x) : c(x, a) <= r} are upper-semicontinuous and compact-valued for each real r. For these two cases we provide new results for the existence of a solution to the average-cost optimality equality and inequality using the vanishing discount approach. We also study the convergence of the policy iteration approach under these conditions. It should be pointed out that we do not make any assumptions regarding the convergence and the continuity of the limit function generated by the sequence of relative difference of the alpha-discounted value functions and the Poisson equations as often encountered in the literature. (C) 2012 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We apply diffusion strategies to propose a cooperative reinforcement learning algorithm, in which agents in a network communicate with their neighbors to improve predictions about their environment. The algorithm is suitable to learn off-policy even in large state spaces. We provide a mean-square-error performance analysis under constant step-sizes. The gain of cooperation in the form of more stability and less bias and variance in the prediction error, is illustrated in the context of a classical model. We show that the improvement in performance is especially significant when the behavior policy of the agents is different from the target policy under evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies the asymptotic optimality of discrete-time Markov decision processes (MDPs) with general state space and action space and having weak and strong interactions. By using a similar approach as developed by Liu, Zhang, and Yin [Appl. Math. Optim., 44 (2001), pp. 105-129], the idea in this paper is to consider an MDP with general state and action spaces and to reduce the dimension of the state space by considering an averaged model. This formulation is often described by introducing a small parameter epsilon > 0 in the definition of the transition kernel, leading to a singularly perturbed Markov model with two time scales. Our objective is twofold. First it is shown that the value function of the control problem for the perturbed system converges to the value function of a limit averaged control problem as epsilon goes to zero. In the second part of the paper, it is proved that a feedback control policy for the original control problem defined by using an optimal feedback policy for the limit problem is asymptotically optimal. Our work extends existing results of the literature in the following two directions: the underlying MDP is defined on general state and action spaces and we do not impose strong conditions on the recurrence structure of the MDP such as Doeblin's condition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Emotion although being an important factor in our every day life it is many times forgotten in the development of systems to be used by persons. In this work we present an architecture for a ubiquitous group decision support system able to support persons in group decision processes. The system considers the emotional factors of the intervenient participants, as well as the argumentation between them. Particular attention will be taken to one of components of this system: the multi-agent simulator, modeling the human participants, considering emotional characteristics, and allowing the exchanges of hypothetic arguments among the participants.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cette thèse est principalement constituée de trois articles traitant des processus markoviens additifs, des processus de Lévy et d'applications en finance et en assurance. Le premier chapitre est une introduction aux processus markoviens additifs (PMA), et une présentation du problème de ruine et de notions fondamentales des mathématiques financières. Le deuxième chapitre est essentiellement l'article "Lévy Systems and the Time Value of Ruin for Markov Additive Processes" écrit en collaboration avec Manuel Morales et publié dans la revue European Actuarial Journal. Cet article étudie le problème de ruine pour un processus de risque markovien additif. Une identification de systèmes de Lévy est obtenue et utilisée pour donner une expression de l'espérance de la fonction de pénalité actualisée lorsque le PMA est un processus de Lévy avec changement de régimes. Celle-ci est une généralisation des résultats existant dans la littérature pour les processus de risque de Lévy et les processus de risque markoviens additifs avec sauts "phase-type". Le troisième chapitre contient l'article "On a Generalization of the Expected Discounted Penalty Function to Include Deficits at and Beyond Ruin" qui est soumis pour publication. Cet article présente une extension de l'espérance de la fonction de pénalité actualisée pour un processus subordinateur de risque perturbé par un mouvement brownien. Cette extension contient une série de fonctions escomptée éspérée des minima successives dus aux sauts du processus de risque après la ruine. Celle-ci a des applications importantes en gestion de risque et est utilisée pour déterminer la valeur espérée du capital d'injection actualisé. Finallement, le quatrième chapitre contient l'article "The Minimal entropy martingale measure (MEMM) for a Markov-modulated exponential Lévy model" écrit en collaboration avec Romuald Hervé Momeya et publié dans la revue Asia-Pacific Financial Market. Cet article présente de nouveaux résultats en lien avec le problème de l'incomplétude dans un marché financier où le processus de prix de l'actif risqué est décrit par un modèle exponentiel markovien additif. Ces résultats consistent à charactériser la mesure martingale satisfaisant le critère de l'entropie. Cette mesure est utilisée pour calculer le prix d'une option, ainsi que des portefeuilles de couverture dans un modèle exponentiel de Lévy avec changement de régimes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

M. R. Banaji and A. G. Greenwald (1995) demonstrated a gender bias in fame judgments—that is, an increase in judged fame due to prior processing that was larger for male than for female names. They suggested that participants shift criteria between judging men and women, using the more liberal criterion for judging men. This "criterion-shift" account appeared problematic for a number of reasons. In this article, 3 experiments are reported that were designed to evaluate the criterion-shift account of the gender bias in the false-fame effect against a distribution-shift account. The results were consistent with the criterion-shift account, and they helped to define more precisely the situations in which people may be ready to shift their response criterion on an item-by-item basis. In addition, the results were incompatible with an interpretation of the criterion shift as an artifact of the experimental situation in the experiments reported by M. R. Banaji and A. G. Greenwald. (PsycINFO Database Record (c) 2010 APA, all rights reserved)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The studies in the present thesis focus on post-decision processes using the theoretical framework of Differentiation and Consolidation Theory. This thesis consists of three studies. In all these studies, pre-decision evaluations are compared with post-decision evaluations in order to explore differences in evaluations of decision alternatives before and after a decision. The main aim of the studies was to describe and gain a clearer and better understanding of how people re-evaluate information, following a decision for which they have experienced the decision and outcome. The studies examine how the attractiveness evaluations of important attributes are restructured from the pre-decision to the post-decision phase; particularly restructuring processes of value conflicts. Value conflict attributes are those in which information speaks against the chosen alternative in a decision. The first study investigates an important real-life decision and illustrates different post-decision (consolidation) processes following the decision. The second study tests whether decisions with value conflicts follow the same consolidation (post-decision restructuring) processes when the conflict is controlled experimentally, as in earlier studies of less controlled real-life decisions. The third study investigates consolidation and value conflicts in decisions in which the consequences are controlled and of different magnitudes. The studies in the present thesis have shown how attractiveness restructuring of attributes in conflict occurs in the post-decision phase. Results from the three studies indicated that attractiveness restructuring of attributes in conflict was stronger for important real-life decisions (Study 1) and in situations in which real consequences followed a decision (Study 3) than in more controlled, hypothetical decision situations (Study 2). Finally, some proposals for future research are suggested, including studies of the effects of outcomes and consequences on consolidation of prior decisions and how a decision maker’s involvement affects his or her pre- and post-decision processes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Task classification is introduced as a method for the evaluation of monitoring behaviour in different task situations. On the basis of an analysis of different monitoring tasks, a task classification system comprising four task 'dimensions' is proposed. The perceptual speed and flexibility of closure categories, which are identified with signal discrimination type, comprise the principal dimension in this taxonomy, the others being sense modality, the time course of events, and source complexity. It is also proposed that decision theory provides the most complete method for the analysis of performance in monitoring tasks. Several different aspects of decision theory in relation to monitoring behaviour are described. A method is also outlined whereby both accuracy and latency measures of performance may be analysed within the same decision theory framework. Eight experiments and an organizational study are reported. The results show that a distinction can be made between the perceptual efficiency (sensitivity) of a monitor and his criterial level of response, and that in most monitoring situations, there is no decrement in efficiency over the work period, but an increase in the strictness of the response criterion. The range of tasks exhibiting either or both of these performance trends can be specified within the task classification system. In particular, it is shown that a sensitivity decrement is only obtained for 'speed' tasks with a high stimulation rate. A distinctive feature of 'speed' tasks is that target detection requires the discrimination of a change in a stimulus relative to preceding stimuli, whereas in 'closure' tasks, the information required for the discrimination of targets is presented at the same point In time. In the final study, the specification of tasks yielding sensitivity decrements is shown to be consistent with a task classification analysis of the monitoring literature. It is also demonstrated that the signal type dimension has a major influence on the consistency of individual differences in performance in different tasks. The results provide an empirical validation for the 'speed' and 'closure' categories, and suggest that individual differences are not completely task specific but are dependent on the demands common to different tasks. Task classification is therefore shovn to enable improved generalizations to be made of the factors affecting 1) performance trends over time, and 2) the consistencv of performance in different tasks. A decision theory analysis of response latencies is shown to support the view that criterion shifts are obtained in some tasks, while sensitivity shifts are obtained in others. The results of a psychophysiological study also suggest that evoked potential latency measures may provide temporal correlates of criterion shifts in monitoring tasks. Among other results, the finding that the latencies of negative responses do not increase over time is taken to invalidate arousal-based theories of performance trends over a work period. An interpretation in terms of expectancy, however, provides a more reliable explanation of criterion shifts. Although the mechanisms underlying the sensitivity decrement are not completely clear, the results rule out 'unitary' theories such as observing response and coupling theory. It is suggested that an interpretation in terms of the memory data limitations on information processing provides the most parsimonious explanation of all the results in the literature relating to sensitivity decrement. Task classification therefore enables the refinement and selection of theories of monitoring behaviour in terms of their reliability in generalizing predictions to a wide range of tasks. It is thus concluded that task classification and decision theory provide a reliable basis for the assessment and analysis of monitoring behaviour in different task situations.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2010 Mathematics Subject Classification: 60J80.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

100.00% 100.00%

Publicador:

Resumo:

De entre todos os paradigmas de aprendizagem actualmente identificados, a Aprendizagem por Reforço revela-se de especial interesse e aplicabilidade nos inúmeros processos que nos rodeiam: desde a solitária sonda que explora o planeta mais remoto, passando pelo programa especialista que aprende a apoiar a decisão médica pela experiencia adquirida, até ao cão de brincar que faz as delícias da criança interagindo com ela e adaptando-se aos seus gostos, e todo um novo mundo que nos rodeia e apela crescentemente a que façamos mais e melhor nesta área. Desde o aparecimento do conceito de aprendizagem por reforço, diferentes métodos tem sido propostos para a sua concretização, cada um deles abordando aspectos específicos. Duas vertentes distintas, mas complementares entre si, apresentam-se como características chave do processo de aprendizagem por reforço: a obtenção de experiência através da exploração do espaço de estados e o aproveitamento do conhecimento obtido através dessa mesma experiência. Esta dissertação propõe-se seleccionar alguns dos métodos propostos mais promissores de ambas as vertentes de exploração e aproveitamento, efectuar uma implementação de cada um destes sobre uma plataforma modular que permita a simulação do uso de agentes inteligentes e, através da sua aplicação na resolução de diferentes configurações de ambientes padrão, gerar estatísticas funcionais que permitam inferir conclusões que retractem entre outros aspectos a sua eficiência e eficácia comparativas em condições específicas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations can be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. This means learning a policy---a mapping of observations into actions---based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multi-agent systems. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience re-use. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When modeling real-world decision-theoretic planning problems in the Markov Decision Process (MDP) framework, it is often impossible to obtain a completely accurate estimate of transition probabilities. For example, natural uncertainty arises in the transition specification due to elicitation of MOP transition models from an expert or estimation from data, or non-stationary transition distributions arising from insufficient state knowledge. In the interest of obtaining the most robust policy under transition uncertainty, the Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) has been introduced to model such scenarios. Unfortunately, while various solution algorithms exist for MDP-IPs, they often require external calls to optimization routines and thus can be extremely time-consuming in practice. To address this deficiency, we introduce the factored MDP-IP and propose efficient dynamic programming methods to exploit its structure. Noting that the key computational bottleneck in the solution of factored MDP-IPs is the need to repeatedly solve nonlinear constrained optimization problems, we show how to target approximation techniques to drastically reduce the computational overhead of the nonlinear solver while producing bounded, approximately optimal solutions. Our results show up to two orders of magnitude speedup in comparison to traditional ""flat"" dynamic programming approaches and up to an order of magnitude speedup over the extension of factored MDP approximate value iteration techniques to MDP-IPs while producing the lowest error of any approximation algorithm evaluated. (C) 2011 Elsevier B.V. All rights reserved.