974 resultados para Markov Renewal Process


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Effective dialogue management is critically dependent on the information that is encoded in the dialogue state. In order to deploy reinforcement learning for policy optimization, dialogue must be modeled as a Markov Decision Process. This requires that the dialogue statemust encode all relevent information obtained during the dialogue prior to that state. This can be achieved by combining the user goal, the dialogue history, and the last user action to form the dialogue state. In addition, to gain robustness to input errors, dialogue must be modeled as a Partially Observable Markov Decision Process (POMDP) and hence, a distribution over all possible states must be maintained at every dialogue turn. This poses a potential computational limitation since there can be a very large number of dialogue states. The Hidden Information State model provides a principled way of ensuring tractability in a POMDP-based dialogue model. The key feature of this model is the grouping of user goals into partitions that are dynamically built during the dialogue. In this article, we extend this model further to incorporate the notion of complements. This allows for a more complex user goal to be represented, and it enables an effective pruning technique to be implemented that preserves the overall system performance within a limited computational resource more effectively than existing approaches. © 2011 ACM.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters. © 2011 Elsevier Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables a dialogue policy robust to speech understanding errors to be learnt. However, a major challenge in POMDP policy learning is to maintain tractability, so the use of approximation is inevitable. We propose applying Gaussian Processes in Reinforcement learning of optimal POMDP dialogue policies, in order (1) to make the learning process faster and (2) to obtain an estimate of the uncertainty of the approximation. We first demonstrate the idea on a simple voice mail dialogue task and then apply this method to a real-world tourist information dialogue task. © 2010 Association for Computational Linguistics.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The partially observable Markov decision process (POMDP) has been proposed as a dialogue model that enables automatic improvement of the dialogue policy and robustness to speech understanding errors. It requires, however, a large number of dialogues to train the dialogue policy. Gaussian processes (GP) have recently been applied to POMDP dialogue management optimisation showing an ability to substantially increase the speed of learning. Here, we investigate this further using the Bayesian Update of Dialogue State dialogue manager. We show that it is possible to apply Gaussian processes directly to the belief state, removing the need for a parametric policy representation. In addition, the resulting policy learns significantly faster while maintaining operational performance. © 2012 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy. © 2013 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A partially observable Markov decision process (POMDP) has been proposed as a dialog model that enables automatic optimization of the dialog policy and provides robustness to speech understanding errors. Various approximations allow such a model to be used for building real-world dialog systems. However, they require a large number of dialogs to train the dialog policy and hence they typically rely on the availability of a user simulator. They also require significant designer effort to hand-craft the policy representation. We investigate the use of Gaussian processes (GPs) in policy modeling to overcome these problems. We show that GP policy optimization can be implemented for a real world POMDP dialog manager, and in particular: 1) we examine different formulations of a GP policy to minimize variability in the learning process; 2) we find that the use of GP increases the learning rate by an order of magnitude thereby allowing learning by direct interaction with human users; and 3) we demonstrate that designer effort can be substantially reduced by basing the policy directly on the full belief space thereby avoiding ad hoc feature space modeling. Overall, the GP approach represents an important step forward towards fully automatic dialog policy optimization in real world systems. © 2013 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In situ electrochemical scanning tunneling microscopy (ECSTM) has been employed to follow the renewal process of a graphite electrode accompanied by flavin adenine dinucleotide (FAD) electrochemical reaction which involves adsorption of the reduced form (FADH(2)) and desorption of the oxidized form (FAD). The renewal process initiates from steps or kinks on the electrode surface, which provide high active sites for adsorption. This renewal depends on the working electrode potential, especially in the range near the FAD redox potential. Our experiment suggests that delamination of the graphite surface is caused by interaction between the substrate and adsorbed molecules. A simple model is proposed to explain this phenomenon.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we investigate the remanufacturing problem of pricing single-class used products (cores) in the face of random price-dependent returns and random demand. Specifically, we propose a dynamic pricing policy for the cores and then model the problem as a continuous-time Markov decision process. Our models are designed to address three objectives: finite horizon total cost minimization, infinite horizon discounted cost, and average cost minimization. Besides proving optimal policy uniqueness and establishing monotonicity results for the infinite horizon problem, we also characterize the structures of the optimal policies, which can greatly simplify the computational procedure. Finally, we use computational examples to assess the impacts of specific parameters on optimal price and reveal the benefits of a dynamic pricing policy. © 2013 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In remanufacturing, the supply of used products and the demand for remanufactured products are usually mismatched because of the great uncertainties on both sides. In this paper, we propose a dynamic pricing policy to balance this uncertain supply and demand. Specifically, we study a remanufacturer’s problem of pricing a single class of cores with random price-dependent returns and random demand for the remanufactured products with backlogs. We model this pricing task as a continuous-time Markov decision process, which addresses both the finite and infinite horizon problems, and provide managerial insights by analyzing the structural properties of the optimal policy. We then use several computational examples to illustrate the impacts of particular system parameters on pricing policy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cette thèse porte sur les questions d'évaluation et de couverture des options dans un modèle exponentiel-Lévy avec changements de régime. Un tel modèle est construit sur un processus additif markovien un peu comme le modèle de Black- Scholes est basé sur un mouvement Brownien. Du fait de l'existence de plusieurs sources d'aléa, nous sommes en présence d'un marché incomplet et ce fait rend inopérant les développements théoriques initiés par Black et Scholes et Merton dans le cadre d'un marché complet. Nous montrons dans cette thèse que l'utilisation de certains résultats de la théorie des processus additifs markoviens permet d'apporter des solutions aux problèmes d'évaluation et de couverture des options. Notamment, nous arrivons à caracté- riser la mesure martingale qui minimise l'entropie relative à la mesure de probabilit é historique ; aussi nous dérivons explicitement sous certaines conditions, le portefeuille optimal qui permet à un agent de minimiser localement le risque quadratique associé. Par ailleurs, dans une perspective plus pratique nous caract érisons le prix d'une option Européenne comme l'unique solution de viscosité d'un système d'équations intégro-di érentielles non-linéaires. Il s'agit là d'un premier pas pour la construction des schémas numériques pour approcher ledit prix.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The thesis entitled Analysis of Some Stochastic Models in Inventories and Queues. This thesis is devoted to the study of some stochastic models in Inventories and Queues which are physically realizable, though complex. It contains a detailed analysis of the basic stochastic processes underlying these models. In this thesis, (s,S) inventory systems with nonidentically distributed interarrival demand times and random lead times, state dependent demands, varying ordering levels and perishable commodities with exponential life times have been studied. The queueing system of the type Ek/Ga,b/l with server vacations, service systems with single and batch services, queueing system with phase type arrival and service processes and finite capacity M/G/l queue when server going for vacation after serving a random number of customers are also analysed. The analogy between the queueing systems and inventory systems could be exploited in solving certain models. In vacation models, one important result is the stochastic decomposition property of the system size or waiting time. One can think of extending this to the transient case. In inventory theory, one can extend the present study to the case of multi-item, multi-echelon problems. The study of perishable inventory problem when the commodities have a general life time distribution would be a quite interesting problem. The analogy between the queueing systems and inventory systems could be exploited in solving certain models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Este artículo es un avance del proyecto de investigación “Transformaciones de las prácticas pedagógicas de los profesores en un proceso de renovación curricular”. Su marco conceptual se fundamenta en el análisis de cuatro ejes centrales: el primero hace referencia al concepto de currículo, que compromete varios aspectos relacionados con los diferentes actores del sistema educativo; el segundo afronta las renovaciones curriculares en programas de Ciencias de la Salud, que permite aclarar y determinar lo que la comunidad académica entiende al respecto, sus características, fines y efectos; el tercero analiza la información para identificar los factores que posibilitan la apropiación o no de los principios de las renovaciones curriculares reflejados en las prácticas pedagógicas de los profesores; el cuarto explora conceptualizaciones acerca de cómo el profesor interpreta y orienta su actuar educativo. Finalmente, se propone una discusión cuyo hilo conductor responde a las siguientes preguntas: ¿Cómo los procesos de renovación curricular en programas de formación profesional en Ciencias de la Salud impactan las prácticas pedagógicas de los profesores? ¿Cuáles son los factores que posibilitan la apropiación de los principios de las renovaciones curriculares en las prácticas pedagógicas de los profesores en programas de formación profesional en Ciencias de la Salud?

Relevância:

80.00% 80.00%

Publicador:

Resumo:

When modeling real-world decision-theoretic planning problems in the Markov Decision Process (MDP) framework, it is often impossible to obtain a completely accurate estimate of transition probabilities. For example, natural uncertainty arises in the transition specification due to elicitation of MOP transition models from an expert or estimation from data, or non-stationary transition distributions arising from insufficient state knowledge. In the interest of obtaining the most robust policy under transition uncertainty, the Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) has been introduced to model such scenarios. Unfortunately, while various solution algorithms exist for MDP-IPs, they often require external calls to optimization routines and thus can be extremely time-consuming in practice. To address this deficiency, we introduce the factored MDP-IP and propose efficient dynamic programming methods to exploit its structure. Noting that the key computational bottleneck in the solution of factored MDP-IPs is the need to repeatedly solve nonlinear constrained optimization problems, we show how to target approximation techniques to drastically reduce the computational overhead of the nonlinear solver while producing bounded, approximately optimal solutions. Our results show up to two orders of magnitude speedup in comparison to traditional ""flat"" dynamic programming approaches and up to an order of magnitude speedup over the extension of factored MDP approximate value iteration techniques to MDP-IPs while producing the lowest error of any approximation algorithm evaluated. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This is an ethnographic and comparative study of the Maracatus Solar (2006) and Reis de Paus (1960), whose aim was to verify what is ancient and traditional in the new maracatu practiced by the guild Solar and conversely, what is new or modern in the old maracatu ritualized by guild Reis de Paus. It is worth noting that through this case study it is also intended to ethnographically observe and better understand the processes of ruptures and continuities between modernization and tradition, and the relationship between the global and the local. The communication system, the dancing, the music, the costumes and the loas (letters) were analyzed using the technique of participant observation as well as secondary materials such as newspapers, blogs and magazines. The interviews were open, non-directive, but recorded to facilitate understanding the speech of revelers. The research has shown that all the symbolic elements of aesthetic expression of the maracatu are permeated by clashes of historical contexts and of political representation, which, in another instance, also enunciates a fight of micro-community resistance regarding the renewal process and the social development that plague modern megalopolis. It is In this interim, between modernity and tradition that today it can be spoken about the existence of hybrid identity in the maracatu regarding a context mediated by the overall above mentioned values and customs specific of the new generations. However, one can not deny that the forms of negotiations with modernity also require the establishment of a link with the specific singularity of a popular culture that is not excluded, but also should not get invaded by the idea of authenticity. Therefore, performing this study was above all an opportunity to understand also the community life in the city outskirts, understanding society, culture and everyday social relations maintained between humans that produce and make it all happen. The Solar and Reis de Paus do not join in opposition between themselves nor by their similarity. What is most striking among them is the renewal of a tradition that reinvents itself in the form of popular representation across the street parade

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Techniques of optimization known as metaheuristics have achieved success in the resolution of many problems classified as NP-Hard. These methods use non deterministic approaches that reach very good solutions which, however, don t guarantee the determination of the global optimum. Beyond the inherent difficulties related to the complexity that characterizes the optimization problems, the metaheuristics still face the dilemma of xploration/exploitation, which consists of choosing between a greedy search and a wider exploration of the solution space. A way to guide such algorithms during the searching of better solutions is supplying them with more knowledge of the problem through the use of a intelligent agent, able to recognize promising regions and also identify when they should diversify the direction of the search. This way, this work proposes the use of Reinforcement Learning technique - Q-learning Algorithm - as exploration/exploitation strategy for the metaheuristics GRASP (Greedy Randomized Adaptive Search Procedure) and Genetic Algorithm. The GRASP metaheuristic uses Q-learning instead of the traditional greedy-random algorithm in the construction phase. This replacement has the purpose of improving the quality of the initial solutions that are used in the local search phase of the GRASP, and also provides for the metaheuristic an adaptive memory mechanism that allows the reuse of good previous decisions and also avoids the repetition of bad decisions. In the Genetic Algorithm, the Q-learning algorithm was used to generate an initial population of high fitness, and after a determined number of generations, where the rate of diversity of the population is less than a certain limit L, it also was applied to supply one of the parents to be used in the genetic crossover operator. Another significant change in the hybrid genetic algorithm is the proposal of a mutually interactive cooperation process between the genetic operators and the Q-learning algorithm. In this interactive/cooperative process, the Q-learning algorithm receives an additional update in the matrix of Q-values based on the current best solution of the Genetic Algorithm. The computational experiments presented in this thesis compares the results obtained with the implementation of traditional versions of GRASP metaheuristic and Genetic Algorithm, with those obtained using the proposed hybrid methods. Both algorithms had been applied successfully to the symmetrical Traveling Salesman Problem, which was modeled as a Markov decision process