47 resultados para MARKOV DECISION PROCESSES
Resumo:
Although partially observable Markov decision processes (POMDPs) have shown great promise as a framework for dialog management in spoken dialog systems, important scalability issues remain. This paper tackles the problem of scaling slot-filling POMDP-based dialog managers to many slots with a novel technique called composite point-based value iteration (CSPBVI). CSPBVI creates a "local" POMDP policy for each slot; at runtime, each slot nominates an action and a heuristic chooses which action to take. Experiments in dialog simulation show that CSPBVI successfully scales POMDP-based dialog managers without compromising performance gains over baseline techniques and preserving robustness to errors in user model estimation. Copyright © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.
Resumo:
This article presents a novel algorithm for learning parameters in statistical dialogue systems which are modeled as Partially Observable Markov Decision Processes (POMDPs). The three main components of a POMDP dialogue manager are a dialogue model representing dialogue state information; a policy that selects the system's responses based on the inferred state; and a reward function that specifies the desired behavior of the system. Ideally both the model parameters and the policy would be designed to maximize the cumulative reward. However, while there are many techniques available for learning the optimal policy, no good ways of learning the optimal model parameters that scale to real-world dialogue systems have been found yet. The presented algorithm, called the Natural Actor and Belief Critic (NABC), is a policy gradient method that offers a solution to this problem. Based on observed rewards, the algorithm estimates the natural gradient of the expected cumulative reward. The resulting gradient is then used to adapt both the prior distribution of the dialogue model parameters and the policy parameters. In addition, the article presents a variant of the NABC algorithm, called the Natural Belief Critic (NBC), which assumes that the policy is fixed and only the model parameters need to be estimated. The algorithms are evaluated on a spoken dialogue system in the tourist information domain. The experiments show that model parameters estimated to maximize the expected cumulative reward result in significantly improved performance compared to the baseline hand-crafted model parameters. The algorithms are also compared to optimization techniques using plain gradients and state-of-the-art random search algorithms. In all cases, the algorithms based on the natural gradient work significantly better. © 2011 ACM.
Resumo:
Statistical dialog systems (SDSs) are motivated by the need for a data-driven framework that reduces the cost of laboriously handcrafting complex dialog managers and that provides robustness against the errors created by speech recognizers operating in noisy environments. By including an explicit Bayesian model of uncertainty and by optimizing the policy via a reward-driven process, partially observable Markov decision processes (POMDPs) provide such a framework. However, exact model representation and optimization is computationally intractable. Hence, the practical application of POMDP-based systems requires efficient algorithms and carefully constructed approximations. This review article provides an overview of the current state of the art in the development of POMDP-based spoken dialog systems. © 1963-2012 IEEE.
Resumo:
Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables a dialogue policy robust to speech understanding errors to be learnt. However, a major challenge in POMDP policy learning is to maintain tractability, so the use of approximation is inevitable. We propose applying Gaussian Processes in Reinforcement learning of optimal POMDP dialogue policies, in order (1) to make the learning process faster and (2) to obtain an estimate of the uncertainty of the approximation. We first demonstrate the idea on a simple voice mail dialogue task and then apply this method to a real-world tourist information dialogue task. © 2010 Association for Computational Linguistics.
Resumo:
A partially observable Markov decision process (POMDP) has been proposed as a dialog model that enables automatic optimization of the dialog policy and provides robustness to speech understanding errors. Various approximations allow such a model to be used for building real-world dialog systems. However, they require a large number of dialogs to train the dialog policy and hence they typically rely on the availability of a user simulator. They also require significant designer effort to hand-craft the policy representation. We investigate the use of Gaussian processes (GPs) in policy modeling to overcome these problems. We show that GP policy optimization can be implemented for a real world POMDP dialog manager, and in particular: 1) we examine different formulations of a GP policy to minimize variability in the learning process; 2) we find that the use of GP increases the learning rate by an order of magnitude thereby allowing learning by direct interaction with human users; and 3) we demonstrate that designer effort can be substantially reduced by basing the policy directly on the full belief space thereby avoiding ad hoc feature space modeling. Overall, the GP approach represents an important step forward towards fully automatic dialog policy optimization in real world systems. © 2013 IEEE.
Resumo:
The partially observable Markov decision process (POMDP) has been proposed as a dialogue model that enables automatic improvement of the dialogue policy and robustness to speech understanding errors. It requires, however, a large number of dialogues to train the dialogue policy. Gaussian processes (GP) have recently been applied to POMDP dialogue management optimisation showing an ability to substantially increase the speed of learning. Here, we investigate this further using the Bayesian Update of Dialogue State dialogue manager. We show that it is possible to apply Gaussian processes directly to the belief state, removing the need for a parametric policy representation. In addition, the resulting policy learns significantly faster while maintaining operational performance. © 2012 IEEE.
Resumo:
A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy. © 2013 IEEE.
Resumo:
Production responsiveness refers to the ability of a production system to achieve its operational goals in the presence of supplier, internal and customer disturbances, where disturbances are those sources of change which occur independently of the system's intentions. A set of audit tools for assessing the responsiveness of production operations is being prepared as part of an EPSRC funded investigation. These tools are based on the idea that the ability to respond is linked to: the nature of the disturbances or changes requiring a response; their impact on production goals; and the inherent response capabilities of the operation. These response capabilities include information gathering and processing (to detect disturbances and production conditions), decision processes (which initiate system responses to disturbances) and various types of process flexibilities and buffers (which provide the physical means of dealing with disturbances). The paper discusses concepts and issues associated with production responsiveness, describes the audit tools that have been developed and illustrates their use in the context of a steel manufacturing plant.
Resumo:
The partially observable Markov decision process (POMDP) provides a popular framework for modelling spoken dialogue. This paper describes how the expectation propagation algorithm (EP) can be used to learn the parameters of the POMDP user model. Various special probability factors applicable to this task are presented, which allow the parameters be to learned when the structure of the dialogue is complex. No annotations, neither the true dialogue state nor the true semantics of user utterances, are required. Parameters optimised using the proposed techniques are shown to improve the performance of both offline transcription experiments as well as simulated dialogue management performance. ©2010 IEEE.
Resumo:
Effective dialogue management is critically dependent on the information that is encoded in the dialogue state. In order to deploy reinforcement learning for policy optimization, dialogue must be modeled as a Markov Decision Process. This requires that the dialogue statemust encode all relevent information obtained during the dialogue prior to that state. This can be achieved by combining the user goal, the dialogue history, and the last user action to form the dialogue state. In addition, to gain robustness to input errors, dialogue must be modeled as a Partially Observable Markov Decision Process (POMDP) and hence, a distribution over all possible states must be maintained at every dialogue turn. This poses a potential computational limitation since there can be a very large number of dialogue states. The Hidden Information State model provides a principled way of ensuring tractability in a POMDP-based dialogue model. The key feature of this model is the grouping of user goals into partitions that are dynamically built during the dialogue. In this article, we extend this model further to incorporate the notion of complements. This allows for a more complex user goal to be represented, and it enables an effective pruning technique to be implemented that preserves the overall system performance within a limited computational resource more effectively than existing approaches. © 2011 ACM.
Resumo:
A set of audit tools is being prepared for assessing the response capability of a production operation, as part of an EPSRC1 funded investigation into improving the responsiveness of manufacturing production systems. These tools are based on the idea that the ability to respond is linked to i) the nature of the disturbances or changes requiring a response, ii) their impact on production goals and iii) the decision processes which initiate system responses to disturbances.
Resumo:
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters. © 2011 Elsevier Ltd. All rights reserved.
Resumo:
Social and political concerns are frequently reflected in the design of school buildings, often in turn leading to the development of technical innovations. One example is a recurrent concern about the physical health of the nation, which has at several points over the last century prompted new design approaches to natural light and ventilation. The most critical concern of the current era is the global, rather than the indoor, environment. The resultant political focus on mitigating climate change has resulted in new regulations, and in turn considerable technical changes in building design and construction. The vanguard of this movement has again been in school buildings, set the highest targets for reducing operational carbon by the previous Government. The current austerity measures have moved the focus to the refurbishment and retrofit of existing buildings, in order to bring them up to the exacting new standards. Meanwhile there is little doubt that climate change is happening already, and that the impacts will be considerable. Climate scientists have increasing confidence in their predictions for the future; if today’s buildings are to be resilient to these changes, building designers will need to understand and design for the predicted climates in order to continue to provide comfortable and healthy spaces through the lifetimes of the buildings. This paper describes the decision processes, and the planned design measures, for adapting an existing school for future climates. The project is at St Faith’s School in Cambridge, and focuses on three separate buildings: a large Victorian block built as a substantial domestic dwelling in 1885, a smaller single storey 1970s block with a new extension, and an as-yet unbuilt single storey block designed to passivhaus principles and using environmentally friendly materials. The implications of climate change have been considered for the three particular issues of comfort, construction, and water, as set out in the report on Design for Future Climate: opportunities for adaptation in the built environment (Gething, 2010). The adaptation designs aim to ensure each of the three very different buildings remains fit for purpose throughout the 21st century, continuing to provide a healthy environment for the children. A forth issue, the reduction of carbon and the mitigation of other negative environmental impacts of the construction work, is also a fundamental aim for the school and the project team. Detailed modelling of both the operational and embodied energy and carbon of the design options is therefore being carried out, in order that the whole life carbon costs of the adaptation design options may be minimised. The project has been funded by the Technology Strategy Board as part of the Design for Future Climates programme; the interdisciplinary team includes the designers working on the current school building projects and the school bursar, supported by researchers from the University of Cambridge Centre for Sustainable Development. It is hoped that lessons from the design process, as well as the solutions themselves, will be transferable to other buildings in similar climatic regions.
Resumo:
Given a spectral density matrix or, equivalently, a real autocovariance sequence, the author seeks to determine a finite-dimensional linear time-invariant system which, when driven by white noise, will produce an output whose spectral density is approximately PHI ( omega ), and an approximate spectral factor of PHI ( omega ). The author employs the Anderson-Faurre theory in his analysis.