792 resultados para Decision Process
Resumo:
Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables a dialogue policy robust to speech understanding errors to be learnt. However, a major challenge in POMDP policy learning is to maintain tractability, so the use of approximation is inevitable. We propose applying Gaussian Processes in Reinforcement learning of optimal POMDP dialogue policies, in order (1) to make the learning process faster and (2) to obtain an estimate of the uncertainty of the approximation. We first demonstrate the idea on a simple voice mail dialogue task and then apply this method to a real-world tourist information dialogue task. © 2010 Association for Computational Linguistics.
Resumo:
Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation performance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data. © 2010 Association for Computational Linguistics.
Resumo:
The partially observable Markov decision process (POMDP) has been proposed as a dialogue model that enables automatic improvement of the dialogue policy and robustness to speech understanding errors. It requires, however, a large number of dialogues to train the dialogue policy. Gaussian processes (GP) have recently been applied to POMDP dialogue management optimisation showing an ability to substantially increase the speed of learning. Here, we investigate this further using the Bayesian Update of Dialogue State dialogue manager. We show that it is possible to apply Gaussian processes directly to the belief state, removing the need for a parametric policy representation. In addition, the resulting policy learns significantly faster while maintaining operational performance. © 2012 IEEE.
Resumo:
A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy. © 2013 IEEE.
Resumo:
A partially observable Markov decision process (POMDP) has been proposed as a dialog model that enables automatic optimization of the dialog policy and provides robustness to speech understanding errors. Various approximations allow such a model to be used for building real-world dialog systems. However, they require a large number of dialogs to train the dialog policy and hence they typically rely on the availability of a user simulator. They also require significant designer effort to hand-craft the policy representation. We investigate the use of Gaussian processes (GPs) in policy modeling to overcome these problems. We show that GP policy optimization can be implemented for a real world POMDP dialog manager, and in particular: 1) we examine different formulations of a GP policy to minimize variability in the learning process; 2) we find that the use of GP increases the learning rate by an order of magnitude thereby allowing learning by direct interaction with human users; and 3) we demonstrate that designer effort can be substantially reduced by basing the policy directly on the full belief space thereby avoiding ad hoc feature space modeling. Overall, the GP approach represents an important step forward towards fully automatic dialog policy optimization in real world systems. © 2013 IEEE.
Resumo:
回报函数设计的好与坏对学习系统性能有着重要作用,按回报值在状态-动作空间中的分布情况,将回报函数的构建分为两种形式:密集函数和稀疏函数,分析了密集函数和稀疏函数的特点.提出启发式回报函数的基本设计思路,利用基于保守势函数差分形式的附加回报函数,给学习系统提供更多的启发式信息,并对算法的最优策略不变性和迭代收敛性进行了证明.启发式回报函数能够引导学习,加快学习进程,从而可以实现强化学习在实际大型复杂系统应用中的实时控制和调度.
Resumo:
Hypothetical contingent valuation surveys used to elicit values for environmental and other public goods often employ variants of the referendum mechanism due to the cognitive simplicity and familiarity of respondents with this voting format. One variant, the double referendum mechanism, requires respondents to state twice how they would vote for a given policy proposal given their cost of the good. Data from these surveys often exhibit anomalies inconsistent with standard economic models of consumer preferences. There are a number of published explanations for these anomalies, mostly focusing on problems with the second vote. This article investigates which aspects of the hypothetical task affect the degree of nondemand revelation and takes an individual-based approach to identifying people most likely to non-demand reveal. A clear profile emerges from our model of a person who faces a negative surplus i.e. a net loss in the second vote and invokes non self-interested, non financial motivations during the decision process.
Resumo:
In this paper, we investigate the remanufacturing problem of pricing single-class used products (cores) in the face of random price-dependent returns and random demand. Specifically, we propose a dynamic pricing policy for the cores and then model the problem as a continuous-time Markov decision process. Our models are designed to address three objectives: finite horizon total cost minimization, infinite horizon discounted cost, and average cost minimization. Besides proving optimal policy uniqueness and establishing monotonicity results for the infinite horizon problem, we also characterize the structures of the optimal policies, which can greatly simplify the computational procedure. Finally, we use computational examples to assess the impacts of specific parameters on optimal price and reveal the benefits of a dynamic pricing policy. © 2013 Elsevier B.V. All rights reserved.
Resumo:
In remanufacturing, the supply of used products and the demand for remanufactured products are usually mismatched because of the great uncertainties on both sides. In this paper, we propose a dynamic pricing policy to balance this uncertain supply and demand. Specifically, we study a remanufacturer’s problem of pricing a single class of cores with random price-dependent returns and random demand for the remanufactured products with backlogs. We model this pricing task as a continuous-time Markov decision process, which addresses both the finite and infinite horizon problems, and provide managerial insights by analyzing the structural properties of the optimal policy. We then use several computational examples to illustrate the impacts of particular system parameters on pricing policy.
Resumo:
Making a decision is often a matter of listing and comparing positive and negative arguments. In such cases, the evaluation scale for decisions should be considered bipolar, that is, negative and positive values should be explicitly distinguished. That is what is done, for example, in Cumulative Prospect Theory. However, contrary to the latter framework that presupposes genuine numerical assessments, human agents often decide on the basis of an ordinal ranking of the pros and the cons, and by focusing on the most salient arguments. In other terms, the decision process is qualitative as well as bipolar. In this article, based on a bipolar extension of possibility theory, we define and axiomatically characterize several decision rules tailored for the joint handling of positive and negative arguments in an ordinal setting. The simplest rules can be viewed as extensions of the maximin and maximax criteria to the bipolar case, and consequently suffer from poor decisive power. More decisive rules that refine the former are also proposed. These refinements agree both with principles of efficiency and with the spirit of order-of-magnitude reasoning, that prevails in qualitative decision theory. The most refined decision rule uses leximin rankings of the pros and the cons, and the ideas of counting arguments of equal strength and cancelling pros by cons. It is shown to come down to a special case of Cumulative Prospect Theory, and to subsume the “Take the Best” heuristic studied by cognitive psychologists.
Resumo:
While the repeated nature of Discrete Choice Experiments is advantageous from a sampling efficiency perspective, patterns of choice may differ across the tasks, due, in part, to learning and fatigue. Using probabilistic decision process models, we find in a field study that learning and fatigue behavior may only be exhibited by a small subset of respondents. Most respondents in our sample show preference and variance stability consistent with rational pre-existent and
well formed preferences. Nearly all of the remainder exhibit both learning and fatigue effects. An important aspect of our approach is that it enables learning and fatigue effects to be explored, even though they were not envisaged during survey design or data collection.
Resumo:
A partir de uma amostra de 600 turistas internacionais que circulam em Portugal, Espanha e Itália, este estudo identifica as principais os conceitos chave relacionados com o terrorismo, a percepção de risco, envolvimento e motivação para a segurança dos turistas internacionais. Diferentes níveis de preocupação relativamente à segurança pode influenciar as decisões dos turistas. No seu processo de decisão, os turistas avaliam vários factores, nomeadamente, o nível de risco ou de segurança que consideram nos destinos (Sonmez, 1998). Os turistas adoptam uma atitude protectora alterando os seus comportamentos durante os processos de decisão, substituindo os destinos que consideram inseguros por outros associados a uma maior segurança (Gu & Martin, 1992; Mansfeld, 1996). O terrorismo exacerbado pelos media tem efeitos graves nas receitas dos destinos turísticos (Taylor, 2006). Através da publicidade negativa, um destino turístico que experiencia um incidente terrorista pode ver a sua reputação danificada e a actividade turística severamente comprometida (Sonmez, 1998). Inclusivamente, a imageme negativa de um destino pode ser generalizada e pode também afectar outros países ou regiões por períodos de tempo indeterminados (Taylor, 2006). Um modelo de equações estruturais revela que os turistas são motivados para adquirir informação sobre o terrorismo nos media, nomeadamente mostram atenção e interesse sobre essas notícias e esse facto influencia directamente o seu risco percebido. A percepção de risco influencia directamente o envolvimento dos turistas no planeamento da viagem, especificamente a procura de informação antes da viagem e quando estão no destino. A percepção de risco e o envolvimento dos turistas influencia a percepção da importância da segurança.A discussão foca as implicações deste modelo para a teoria e para as instituições e organizações turísticas. São igualmente apresentadas recomendações para os gestores e promotores dos destinos e para os gestores das organizações turísticas. Direcções futuras de investigação são igualmente apresentadas.
Resumo:
Este relatório apresenta o estudo de duas linhas de montagem de câmaras de vigilância da empresa Bosch Security Systems, S.A. de Ovar. Numa primeira fase procedeu-se à elaboração das listas de tarefas e respectivas precedências, seguindo-se a medição de trabalho, com o intuito de se actualizarem os tempos padrão existentes. Procedeu-se à comparação dos tempos obtidos com os que se encontravam em vigor de modo a perceber as diferenças e motivos das mesmas. Numa segunda fase, realizaram-se balanceamentos para as duas linhas tendo como cenários a manutenção das duas linhas e a possibilidade da sua junção numa linha única. Analisaram-se todos os resultados e efectuou-se um levantamento do investimento necessário associado a cada um dos cenários. Realizou-se deste modo uma análise de viabilidade com vista ao apoio à decisão. Por fim, realizou-se o workshop Lean Line Design que teve como resultado a configuração física da linha final. Este projecto permitiu chegar a resultados aliciantes, com ganhos a vários níveis. Constituiu mais uma acção de melhoria da empresa, levando-a a rectificar lacunas existentes e ao cumprimento de procedimentos ergonómicos que já se encontravam definidos.
Resumo:
The Tourism activity, due to its own characteristics, causes high environmental impact and its development should be influenced by the environmental characteristics of each region. The Information Systems, mainly those which represent the geographical information, will permit the development of a design, allowing the Decision-Maker the consult, the management and the presentation of decision schemes based on the defined measures of the Tourism Planning of a region. As the information associated with this design should be real and update, the Internet should be used as a means of access to the information of the region. The design presents the schemes associated to each decision, offering competitive advantages to the Decision-Makers involved in the decision process, since it is possible to evaluate, foresee and control the future environmental impacts of Tourism.
Resumo:
Tese de doutoramento, Estatística e Investigação Operacional (Análise de Sistemas), Universidade de Lisboa, Faculdade de Ciências, 2014