914 resultados para Markov decision process (POMDP)
Resumo:
As levels of investment in advanced manufacturing systems increase, effective project management becomes ever more critical. This paper demonstrates how the model proposed by Mintzberg, Raisinghani and Theoret in 1976, which structures complicated strategic decision processes, can be applied to the design of new production systems for both descriptive and analytical research purposes. This paper sets a detailed case study concerning the design and development of an advanced manufacturing system within the Mintzberg decision model and so breaks down the decision sequence into constituent parts. It thus shows how a structured model can provide a framework for the researcher who wishes to study decision episodes in the design of manufacturing facilities in greater depth.
Resumo:
Purpose – This paper describes research that has sought to create a formal and rational process that guides manufacturers through the strategic positioning decision. Design/methodology/approach – The methodology is based on a series of case studies to develop and test the decision process. Findings – A decision process that leads the practitioner through an analytical process to decide which manufacturing activities they should carryout themselves. Practical implications – Strategic positioning is concerned with choosing those production related activities that an organisations should carry out internally, and those that should be external and under the ownership and control of suppliers, partners, distributors and customers. Originality/value – This concept extends traditional decision paradigms, such as those associated with “make versus buy” and “outsourcing”, by looking at the interactions between manufacturing operations and the wider supply chain networks associated with the organisation.
Resumo:
Infrastructure management agencies are facing multiple challenges, including aging infrastructure, reduction in capacity of existing infrastructure, and availability of limited funds. Therefore, decision makers are required to think innovatively and develop inventive ways of using available funds. Maintenance investment decisions are generally made based on physical condition only. It is important to understand that spending money on public infrastructure is synonymous with spending money on people themselves. This also requires consideration of decision parameters, in addition to physical condition, such as strategic importance, socioeconomic contribution and infrastructure utilization. Consideration of multiple decision parameters for infrastructure maintenance investments can be beneficial in case of limited funding. Given this motivation, this dissertation presents a prototype decision support framework to evaluate trade-off, among competing infrastructures, that are candidates for infrastructure maintenance, repair and rehabilitation investments. Decision parameters' performances measured through various factors are combined to determine the integrated state of an infrastructure using Multi-Attribute Utility Theory (MAUT). The integrated state, cost and benefit estimates of probable maintenance actions are utilized alongside expert opinion to develop transition probability and reward matrices for each probable maintenance action for a particular candidate infrastructure. These matrices are then used as an input to the Markov Decision Process (MDP) for the finite-stage dynamic programming model to perform project (candidate)-level analysis to determine optimized maintenance strategies based on reward maximization. The outcomes of project (candidate)-level analysis are then utilized to perform network-level analysis taking the portfolio management approach to determine a suitable portfolio under budgetary constraints. The major decision support outcomes of the prototype framework include performance trend curves, decision logic maps, and a network-level maintenance investment plan for the upcoming years. The framework has been implemented with a set of bridges considered as a network with the assistance of the Pima County DOT, AZ. It is expected that the concept of this prototype framework can help infrastructure management agencies better manage their available funds for maintenance.
Resumo:
When modeling real-world decision-theoretic planning problems in the Markov Decision Process (MDP) framework, it is often impossible to obtain a completely accurate estimate of transition probabilities. For example, natural uncertainty arises in the transition specification due to elicitation of MOP transition models from an expert or estimation from data, or non-stationary transition distributions arising from insufficient state knowledge. In the interest of obtaining the most robust policy under transition uncertainty, the Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) has been introduced to model such scenarios. Unfortunately, while various solution algorithms exist for MDP-IPs, they often require external calls to optimization routines and thus can be extremely time-consuming in practice. To address this deficiency, we introduce the factored MDP-IP and propose efficient dynamic programming methods to exploit its structure. Noting that the key computational bottleneck in the solution of factored MDP-IPs is the need to repeatedly solve nonlinear constrained optimization problems, we show how to target approximation techniques to drastically reduce the computational overhead of the nonlinear solver while producing bounded, approximately optimal solutions. Our results show up to two orders of magnitude speedup in comparison to traditional ""flat"" dynamic programming approaches and up to an order of magnitude speedup over the extension of factored MDP approximate value iteration techniques to MDP-IPs while producing the lowest error of any approximation algorithm evaluated. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Techniques of optimization known as metaheuristics have achieved success in the resolution of many problems classified as NP-Hard. These methods use non deterministic approaches that reach very good solutions which, however, don t guarantee the determination of the global optimum. Beyond the inherent difficulties related to the complexity that characterizes the optimization problems, the metaheuristics still face the dilemma of xploration/exploitation, which consists of choosing between a greedy search and a wider exploration of the solution space. A way to guide such algorithms during the searching of better solutions is supplying them with more knowledge of the problem through the use of a intelligent agent, able to recognize promising regions and also identify when they should diversify the direction of the search. This way, this work proposes the use of Reinforcement Learning technique - Q-learning Algorithm - as exploration/exploitation strategy for the metaheuristics GRASP (Greedy Randomized Adaptive Search Procedure) and Genetic Algorithm. The GRASP metaheuristic uses Q-learning instead of the traditional greedy-random algorithm in the construction phase. This replacement has the purpose of improving the quality of the initial solutions that are used in the local search phase of the GRASP, and also provides for the metaheuristic an adaptive memory mechanism that allows the reuse of good previous decisions and also avoids the repetition of bad decisions. In the Genetic Algorithm, the Q-learning algorithm was used to generate an initial population of high fitness, and after a determined number of generations, where the rate of diversity of the population is less than a certain limit L, it also was applied to supply one of the parents to be used in the genetic crossover operator. Another significant change in the hybrid genetic algorithm is the proposal of a mutually interactive cooperation process between the genetic operators and the Q-learning algorithm. In this interactive/cooperative process, the Q-learning algorithm receives an additional update in the matrix of Q-values based on the current best solution of the Genetic Algorithm. The computational experiments presented in this thesis compares the results obtained with the implementation of traditional versions of GRASP metaheuristic and Genetic Algorithm, with those obtained using the proposed hybrid methods. Both algorithms had been applied successfully to the symmetrical Traveling Salesman Problem, which was modeled as a Markov decision process
Resumo:
Este trabalho apresenta uma solução para o problema de controle admissão de conexão e alocação dinâmica de recursos em redes IEEE 802.16 através da modelagem de um Processo Markoviano de Decisão (PMD) utilizando o conceito de degradação de largura de banda, o qual é baseado nos requisitos diferenciados de largura de banda das classes de serviço do IEEE 802.16. Para o critério de desempenho do PMD é feita a atribuição de diferentes retornos a cada classe de serviço, fazendo assim o tratamento diferenciado de cada fluxo. Nesse sentido, é possível avaliar a política ótima, obtida através de um algoritmo de iteração de valores, considerando aspectos como o nível de degradação médio das classes de serviço, utilização dos recursos e probabilidades de bloqueios de cada classe de serviço em relação à carga do sistema. Resultados obtidos mostram que o método de controle markoviano proposto é capaz de priorizar as classes de serviço consideradas mais relevantes para o sistema.
Resumo:
In Smart Grids, a variety of new applications are available to users of the electrical system (from consumers to the electric system operators and market operators). Some applications such as the SCADA systems, which control generators or substations, have consequences, for example, with a communication delay. The result of a failure to deliver a control message due to noncompliance of the time constraint can be catastrophic. On the other hand, applications such as smart metering of consumption have fewer restrictions. Since each type of application has different quality of service requirements (importance, delay, and amount of data to transmit) to transmit its messages, the policy to control and share the resources of the data communication network must consider them. In this paper Markov Decision Process Theory is employed to determine optimal policies to explore as much as possible the availability of throughput in order to transmit all kinds of messages, considering the quality of service requirements defined to each kind of message. First a non-preemptive model is formulated and after that a preemptive model is derived. Numerical results are used to compare FIFO, non-preemptive and preemptive policies.
Resumo:
This paper contributes with a unified formulation that merges previ- ous analysis on the prediction of the performance ( value function ) of certain sequence of actions ( policy ) when an agent operates a Markov decision process with large state-space. When the states are represented by features and the value function is linearly approxi- mated, our analysis reveals a new relationship between two common cost functions used to obtain the optimal approximation. In addition, this analysis allows us to propose an efficient adaptive algorithm that provides an unbiased linear estimate. The performance of the pro- posed algorithm is illustrated by simulation, showing competitive results when compared with the state-of-the-art solutions.
Resumo:
This paper introduces systems of exchange values as tools for the organization of multi-agent systems. Systems of exchange values are defined on the basis of the theory of social exchanges, developed by Piaget and Homans. A model of social organization is proposed, where social relations are construed as social exchanges and exchange values are put into use in the support of the continuity of the performance of social exchanges. The dynamics of social organizations is formulated in terms of the regulation of exchanges of values, so that social equilibrium is connected to the continuity of the interactions. The concept of supervisor of social equilibrium is introduced as a centralized mechanism for solving the problem of the equilibrium of the organization The equilibrium supervisor solves such problem making use of a qualitative Markov Decision Process that uses numerical intervals for the representation of exchange values.
Resumo:
The challenge of detecting a change in the distribution of data is a sequential decision problem that is relevant to many engineering solutions, including quality control and machine and process monitoring. This dissertation develops techniques for exact solution of change-detection problems with discrete time and discrete observations. Change-detection problems are classified as Bayes or minimax based on the availability of information on the change-time distribution. A Bayes optimal solution uses prior information about the distribution of the change time to minimize the expected cost, whereas a minimax optimal solution minimizes the cost under the worst-case change-time distribution. Both types of problems are addressed. The most important result of the dissertation is the development of a polynomial-time algorithm for the solution of important classes of Markov Bayes change-detection problems. Existing techniques for epsilon-exact solution of partially observable Markov decision processes have complexity exponential in the number of observation symbols. A new algorithm, called constellation induction, exploits the concavity and Lipschitz continuity of the value function, and has complexity polynomial in the number of observation symbols. It is shown that change-detection problems with a geometric change-time distribution and identically- and independently-distributed observations before and after the change are solvable in polynomial time. Also, change-detection problems on hidden Markov models with a fixed number of recurrent states are solvable in polynomial time. A detailed implementation and analysis of the constellation-induction algorithm are provided. Exact solution methods are also established for several types of minimax change-detection problems. Finite-horizon problems with arbitrary observation distributions are modeled as extensive-form games and solved using linear programs. Infinite-horizon problems with linear penalty for detection delay and identically- and independently-distributed observations can be solved in polynomial time via epsilon-optimal parameterization of a cumulative-sum procedure. Finally, the properties of policies for change-detection problems are described and analyzed. Simple classes of formal languages are shown to be sufficient for epsilon-exact solution of change-detection problems, and methods for finding minimally sized policy representations are described.
Resumo:
Natural language processing has achieved great success in a wide range of ap- plications, producing both commercial language services and open-source language tools. However, most methods take a static or batch approach, assuming that the model has all information it needs and makes a one-time prediction. In this disser- tation, we study dynamic problems where the input comes in a sequence instead of all at once, and the output must be produced while the input is arriving. In these problems, predictions are often made based only on partial information. We see this dynamic setting in many real-time, interactive applications. These problems usually involve a trade-off between the amount of input received (cost) and the quality of the output prediction (accuracy). Therefore, the evaluation considers both objectives (e.g., plotting a Pareto curve). Our goal is to develop a formal understanding of sequential prediction and decision-making problems in natural language processing and to propose efficient solutions. Toward this end, we present meta-algorithms that take an existent batch model and produce a dynamic model to handle sequential inputs and outputs. Webuild our framework upon theories of Markov Decision Process (MDP), which allows learning to trade off competing objectives in a principled way. The main machine learning techniques we use are from imitation learning and reinforcement learning, and we advance current techniques to tackle problems arising in our settings. We evaluate our algorithm on a variety of applications, including dependency parsing, machine translation, and question answering. We show that our approach achieves a better cost-accuracy trade-off than the batch approach and heuristic-based decision- making approaches. We first propose a general framework for cost-sensitive prediction, where dif- ferent parts of the input come at different costs. We formulate a decision-making process that selects pieces of the input sequentially, and the selection is adaptive to each instance. Our approach is evaluated on both standard classification tasks and a structured prediction task (dependency parsing). We show that it achieves similar prediction quality to methods that use all input, while inducing a much smaller cost. Next, we extend the framework to problems where the input is revealed incremen- tally in a fixed order. We study two applications: simultaneous machine translation and quiz bowl (incremental text classification). We discuss challenges in this set- ting and show that adding domain knowledge eases the decision-making problem. A central theme throughout the chapters is an MDP formulation of a challenging problem with sequential input/output and trade-off decisions, accompanied by a learning algorithm that solves the MDP.
Resumo:
The study of Information Technology (IT) outsourcing is relevant because companies are outsourcing their activities more than ever. An important IT outsourcing research area is the decision-making process. In other words, the comprehension of how companies decide about outsourcing their IT operations is relevant from research point of view. Therefore, the objective of this study is to understand the decision-making process used by Brazilian companies when outsourcing their IT operations. An analysis of the literature that refers to this subject showed that six aspects are usually considered by companies on the evaluation of IT outsourcing service alternatives. This research verified how these six aspects are considered by Brazilian companies on IT outsourcing decisions. The survey showed that Brazilian companies consider all the six aspects, but each of them has a different level of importance. The research also grouped the aspects according to their level of importance and interdependency, using factorial analysis to understand the logic behind IT outsourcing decision process. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
De entre todos os paradigmas de aprendizagem actualmente identificados, a Aprendizagem por Reforço revela-se de especial interesse e aplicabilidade nos inúmeros processos que nos rodeiam: desde a solitária sonda que explora o planeta mais remoto, passando pelo programa especialista que aprende a apoiar a decisão médica pela experiencia adquirida, até ao cão de brincar que faz as delícias da criança interagindo com ela e adaptando-se aos seus gostos, e todo um novo mundo que nos rodeia e apela crescentemente a que façamos mais e melhor nesta área. Desde o aparecimento do conceito de aprendizagem por reforço, diferentes métodos tem sido propostos para a sua concretização, cada um deles abordando aspectos específicos. Duas vertentes distintas, mas complementares entre si, apresentam-se como características chave do processo de aprendizagem por reforço: a obtenção de experiência através da exploração do espaço de estados e o aproveitamento do conhecimento obtido através dessa mesma experiência. Esta dissertação propõe-se seleccionar alguns dos métodos propostos mais promissores de ambas as vertentes de exploração e aproveitamento, efectuar uma implementação de cada um destes sobre uma plataforma modular que permita a simulação do uso de agentes inteligentes e, através da sua aplicação na resolução de diferentes configurações de ambientes padrão, gerar estatísticas funcionais que permitam inferir conclusões que retractem entre outros aspectos a sua eficiência e eficácia comparativas em condições específicas.
Resumo:
Dissertação apresentada para obtenção do Grau de Doutor em Sistemas de Informação Industriais, Engenharia Electrotécnica, pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia
Resumo:
Dissertação para obtenção do Grau de Doutor em Engenharia Electrotécnica e de Computadores