998 resultados para Copy Policy
Resumo:
Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables a dialogue policy robust to speech understanding errors to be learnt. However, a major challenge in POMDP policy learning is to maintain tractability, so the use of approximation is inevitable. We propose applying Gaussian Processes in Reinforcement learning of optimal POMDP dialogue policies, in order (1) to make the learning process faster and (2) to obtain an estimate of the uncertainty of the approximation. We first demonstrate the idea on a simple voice mail dialogue task and then apply this method to a real-world tourist information dialogue task. © 2010 Association for Computational Linguistics.
Resumo:
Statistical dialogue models have required a large number of dialogues to optimise the dialogue policy, relying on the use of a simulated user. This results in a mismatch between training and live conditions, and significant development costs for the simulator thereby mitigating many of the claimed benefits of such models. Recent work on Gaussian process reinforcement learning, has shown that learning can be substantially accelerated. This paper reports on an experiment to learn a policy for a real-world task directly from human interaction using rewards provided by users. It shows that a usable policy can be learnt in just a few hundred dialogues without needing a user simulator and, using a learning strategy that reduces the risk of taking bad actions. The paper also investigates adaptation behaviour when the system continues learning for several thousand dialogues and highlights the need for robustness to noisy rewards. © 2011 IEEE.
Resumo:
Innovation policies play an important role throughout the development process of emerging industries in China. Existing policy and industry studies view the emergence process as a black-box, and fail to understand the impacts of policy to the process along which it varies. This paper aims to develop a multi-dimensional roadmapping tool to better analyse the dynamics between policy and industrial growth for new industries in China. Through reviewing the emergence process of Chinese wind turbine industry, this paper elaborates how policy and other factors influence the emergence of this industry along this path. Further, this paper generalises some Chinese specifics for the policy-industry dynamics. As a practical output, this study proposes a roadmapping framework that generalises some patterns of policy-industry interactions for the emergence process of new industries in China. This paper will be of interest to policy makers, strategists, investors and industrial experts. Copyright © 2013 Inderscience Enterprises Ltd.
Resumo:
The partially observable Markov decision process (POMDP) has been proposed as a dialogue model that enables automatic improvement of the dialogue policy and robustness to speech understanding errors. It requires, however, a large number of dialogues to train the dialogue policy. Gaussian processes (GP) have recently been applied to POMDP dialogue management optimisation showing an ability to substantially increase the speed of learning. Here, we investigate this further using the Bayesian Update of Dialogue State dialogue manager. We show that it is possible to apply Gaussian processes directly to the belief state, removing the need for a parametric policy representation. In addition, the resulting policy learns significantly faster while maintaining operational performance. © 2012 IEEE.
Resumo:
Recent years have seen enormous demand amongst policy makers for new insights from the behavioural sciences, especially neuroscience. This demand is matched by an increasing willingness on behalf of behavioural scientists to translate the policy implications of their work. But can neuroscience really help shape the governance of a nation? Or does this represent growing misuse of neuroscience to attach scientific authority to policy, plus a clutch of neuroscientists trying to overstate their findings for a taste of power?. © 2012.
Resumo:
Over the last decade, research in medical science has focused on knowledge translation and diffusion of best practices to enable improved health outcomes. However, there has been less attention given to the role of policy in influencing the translation of best practice across different national contexts. This paper argues that the underlying set of public discourses of healthcare policy significantly influences its development with implications for the dissemination of best practices. Our research uses Critical Discourse Analysis to examine the policy discourses surrounding the treatment of stroke across Canada and the U.K. It focuses in specific on how concepts of knowledge translation, user empowerment, and service innovation construct different accounts of the health service in the two countries. These findings provide an important yet overlooked starting point for understanding the role of policy development in knowledge transfer and the translation of science into health practice. © 2011 Operational Research Society. All rights reserved.
Resumo:
This case study explores the interaction between domestic and foreign governmental policy on technology transfer with the goal of exploring the long-term impacts of technology transfer. Specifically, the impact of successive licensing of fighter aircraft manufacturing and design to Japan in the development of Japan's aircraft industry is reviewed. Results indicate Japan has built a domestic aircraft industry through sequential learning with foreign technology transfers from the United States, and design and production on domestic fighter aircraft. This process was facilitated by governmental policies in both Japan and the United States. Published by Elsevier B.V.
Resumo:
Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to influence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general "full spike train" code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems.
Resumo:
A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy. © 2013 IEEE.