878 resultados para semi-Markov decision process
Resumo:
Purpose – This paper describes research that has sought to create a formal and rational process that guides manufacturers through the strategic positioning decision. Design/methodology/approach – The methodology is based on a series of case studies to develop and test the decision process. Findings – A decision process that leads the practitioner through an analytical process to decide which manufacturing activities they should carryout themselves. Practical implications – Strategic positioning is concerned with choosing those production related activities that an organisations should carry out internally, and those that should be external and under the ownership and control of suppliers, partners, distributors and customers. Originality/value – This concept extends traditional decision paradigms, such as those associated with “make versus buy” and “outsourcing”, by looking at the interactions between manufacturing operations and the wider supply chain networks associated with the organisation.
Resumo:
In machine learning, Gaussian process latent variable model (GP-LVM) has been extensively applied in the field of unsupervised dimensionality reduction. When some supervised information, e.g., pairwise constraints or labels of the data, is available, the traditional GP-LVM cannot directly utilize such supervised information to improve the performance of dimensionality reduction. In this case, it is necessary to modify the traditional GP-LVM to make it capable of handing the supervised or semi-supervised learning tasks. For this purpose, we propose a new semi-supervised GP-LVM framework under the pairwise constraints. Through transferring the pairwise constraints in the observed space to the latent space, the constrained priori information on the latent variables can be obtained. Under this constrained priori, the latent variables are optimized by the maximum a posteriori (MAP) algorithm. The effectiveness of the proposed algorithm is demonstrated with experiments on a variety of data sets. © 2010 Elsevier B.V.
Resumo:
Infrastructure management agencies are facing multiple challenges, including aging infrastructure, reduction in capacity of existing infrastructure, and availability of limited funds. Therefore, decision makers are required to think innovatively and develop inventive ways of using available funds. Maintenance investment decisions are generally made based on physical condition only. It is important to understand that spending money on public infrastructure is synonymous with spending money on people themselves. This also requires consideration of decision parameters, in addition to physical condition, such as strategic importance, socioeconomic contribution and infrastructure utilization. Consideration of multiple decision parameters for infrastructure maintenance investments can be beneficial in case of limited funding. Given this motivation, this dissertation presents a prototype decision support framework to evaluate trade-off, among competing infrastructures, that are candidates for infrastructure maintenance, repair and rehabilitation investments. Decision parameters' performances measured through various factors are combined to determine the integrated state of an infrastructure using Multi-Attribute Utility Theory (MAUT). The integrated state, cost and benefit estimates of probable maintenance actions are utilized alongside expert opinion to develop transition probability and reward matrices for each probable maintenance action for a particular candidate infrastructure. These matrices are then used as an input to the Markov Decision Process (MDP) for the finite-stage dynamic programming model to perform project (candidate)-level analysis to determine optimized maintenance strategies based on reward maximization. The outcomes of project (candidate)-level analysis are then utilized to perform network-level analysis taking the portfolio management approach to determine a suitable portfolio under budgetary constraints. The major decision support outcomes of the prototype framework include performance trend curves, decision logic maps, and a network-level maintenance investment plan for the upcoming years. The framework has been implemented with a set of bridges considered as a network with the assistance of the Pima County DOT, AZ. It is expected that the concept of this prototype framework can help infrastructure management agencies better manage their available funds for maintenance.
Resumo:
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). OLP uses its experience so far to estimate the MDP. It chooses actions by optimistically maximizing estimated future rewards over a set of next-state transition probabilities that are close to the estimates, a computation that corresponds to solving linear programs. We show that the total expected reward obtained by OLP up to time T is within C(P) log T of the reward obtained by the optimal policy, where C(P) is an explicit, MDP-dependent constant. OLP is closely related to an algorithm proposed by Burnetas and Katehakis with four key differences: OLP is simpler, it does not require knowledge of the supports of transition probabilities, the proof of the regret bound is simpler, but our regret bound is a constant factor larger than the regret of their algorithm. OLP is also similar in flavor to an algorithm recently proposed by Auer and Ortner. But OLP is simpler and its regret bound has a better dependence on the size of the MDP.
Resumo:
We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H, we show a regret bound of ~ O(HS p AT ). We also relate the span to various diameter-like quantities associated with the MDP, demonstrating how our results improve on previous regret bounds.
Resumo:
The IEEE Wireless LAN standard has been a true success story by enabling convenient, efficient and low-cost access to broadband networks for both private and professional use. However, the increasing density and uncoordinated operation of wireless access points, combined with constantly growing traffic demands have started hurting the users' quality of experience. On the other hand, the emerging ubiquity of wireless access has placed it at the center of attention for network attacks, which not only raises users' concerns on security but also indirectly affects connection quality due to proactive measures against security attacks. In this work, we introduce an integrated solution to congestion avoidance and attack mitigation problems through cooperation among wireless access points. The proposed solution implements a Partially Observable Markov Decision Process (POMDP) as an intelligent distributed control system. By successfully differentiating resource hampering attacks from overload cases, the control system takes an appropriate action in each detected anomaly case without disturbing the quality of service for end users. The proposed solution is fully implemented on a small-scale testbed, on which we present our observations and demonstrate the effectiveness of the system to detect and alleviate both attack and congestion situations.
Resumo:
Robots currently recognise and use objects through algorithms that are hand-coded or specifically trained. Such robots can operate in known, structured environments but cannot learn to recognise or use novel objects as they appear. This thesis demonstrates that a robot can develop meaningful object representations by learning the fundamental relationship between action and change in sensory state; the robot learns sensorimotor coordination. Methods based on Markov Decision Processes are experimentally validated on a mobile robot capable of gripping objects, and it is found that object recognition and manipulation can be learnt as an emergent property of sensorimotor coordination.
Learned stochastic mobility prediction for planning with control uncertainty on unstructured terrain
Resumo:
Motion planning for planetary rovers must consider control uncertainty in order to maintain the safety of the platform during navigation. Modelling such control uncertainty is difficult due to the complex interaction between the platform and its environment. In this paper, we propose a motion planning approach whereby the outcome of control actions is learned from experience and represented statistically using a Gaussian process regression model. This mobility prediction model is trained using sample executions of motion primitives on representative terrain, and predicts the future outcome of control actions on similar terrain. Using Gaussian process regression allows us to exploit its inherent measure of prediction uncertainty in planning. We integrate mobility prediction into a Markov decision process framework and use dynamic programming to construct a control policy for navigation to a goal region in a terrain map built using an on-board depth sensor. We consider both rigid terrain, consisting of uneven ground, small rocks, and non-traversable rocks, and also deformable terrain. We introduce two methods for training the mobility prediction model from either proprioceptive or exteroceptive observations, and report results from nearly 300 experimental trials using a planetary rover platform in a Mars-analogue environment. Our results validate the approach and demonstrate the value of planning under uncertainty for safe and reliable navigation.
Resumo:
Yao, Begg, and Livingston (1996, Biometrics 52, 992-1001) considered the optimal group size for testing a series of potentially therapeutic agents to identify a promising one as soon as possible for given error rates. The number of patients to be tested with each agent was fixed as the group size. We consider a sequential design that allows early acceptance and rejection, and we provide an optimal strategy to minimize the sample sizes (patients) required using Markov decision processes. The minimization is under the constraints of the two types (false positive and false negative) of error probabilities, with the Lagrangian multipliers corresponding to the cost parameters for the two types of errors. Numerical studies indicate that there can be a substantial reduction in the number of patients required.
Resumo:
There are some scenarios in which Unmmaned Aerial Vehicle (UAV) navigation becomes a challenge due to the occlusion of GPS systems signal, the presence of obstacles and constraints in the space in which a UAV operates. An additional challenge is presented when a target whose location is unknown must be found within a confined space. In this paper we present a UAV navigation and target finding mission, modelled as a Partially Observable Markov Decision Process (POMDP) using a state-of-the-art online solver in a real scenario using a low cost commercial multi rotor UAV and a modular system architecture running under the Robotic Operative System (ROS). Using POMDP has several advantages to conventional approaches as they take into account uncertainties in sensor information. We present a framework for testing the mission with simulation tests and real flight tests in which we model the system dynamics and motion and perception uncertainties. The system uses a quad-copter aircraft with an board downwards looking camera without the need of GPS systems while avoiding obstacles within a confined area. Results indicate that the system has 100% success rate in simulation and 80% rate during flight test for finding targets located at different locations.
Resumo:
Average-delay optimal scheduflng of messages arriving to the transmitter of a point-to-point channel is considered in this paper. We consider a discrete time batch-arrival batch-service queueing model for the communication scheme, with service time that may be a function of batch size. The question of delay optimality is addressed within the semi-Markov decision-theoretic framework. Approximations to the average-delay optimal policy are obtained.
Resumo:
We consider the problem of quickest detection of an intrusion using a sensor network, keeping only a minimal number of sensors active. By using a minimal number of sensor devices, we ensure that the energy expenditure for sensing, computation and communication is minimized (and the lifetime of the network is maximized). We model the intrusion detection (or change detection) problem as a Markov decision process (MDP). Based on the theory of MDP, we develop the following closed loop sleep/wake scheduling algorithms: (1) optimal control of Mk+1, the number of sensors in the wake state in time slot k + 1, (2) optimal control of qk+1, the probability of a sensor in the wake state in time slot k + 1, and an open loop sleep/wake scheduling algorithm which (3) computes q, the optimal probability of a sensor in the wake state (which does not vary with time), based on the sensor observations obtained until time slot k. Our results show that an optimum closed loop control on Mk+1 significantly decreases the cost compared to keeping any number of sensors active all the time. Also, among the three algorithms described, we observe that the total cost is minimum for the optimum control on Mk+1 and is maximum for the optimum open loop control on q.
Resumo:
We consider a wireless sensor network whose main function is to detect certain infrequent alarm events, and to forward alarm packets to a base station, using geographical forwarding. The nodes know their locations, and they sleep-wake cycle, waking up periodically but not synchronously. In this situation, when a node has a packet to forward to the sink, there is a trade-off between how long this node waits for a suitable neighbor to wake up and the progress the packet makes towards the sink once it is forwarded to this neighbor. Hence, in choosing a relay node, we consider the problem of minimizing average delay subject to a constraint on the average progress. By constraint relaxation, we formulate this next hop relay selection problem as a Markov decision process (MDP). The exact optimal solution (BF (Best Forward)) can be found, but is computationally intensive. Next, we consider a mathematically simplified model for which the optimal policy (SF (Simplified Forward)) turns out to be a simple one-step-look-ahead rule. Simulations show that SF is very close in performance to BF, even for reasonably small node density. We then study the end-to-end performance of SF in comparison with two extremal policies: Max Forward (MF) and First Forward (FF), and an end-to-end delay minimising policy proposed by Kim et al. 1]. We find that, with appropriate choice of one hop average progress constraint, SF can be tuned to provide a favorable trade-off between end-to-end packet delay and the number of hops in the forwarding path.
Resumo:
We consider the problem of quickest detection of an intrusion using a sensor network, keeping only a minimal number of sensors active. By using a minimal number of sensor devices,we ensure that the energy expenditure for sensing, computation and communication is minimized (and the lifetime of the network is maximized). We model the intrusion detection (or change detection) problem as a Markov decision process (MDP). Based on the theory of MDP, we develop the following closed loop sleep/wake scheduling algorithms: 1) optimal control of Mk+1, the number of sensors in the wake state in time slot k + 1, 2) optimal control of qk+1, the probability of a sensor in the wake state in time slot k + 1, and an open loop sleep/wake scheduling algorithm which 3) computes q, the optimal probability of a sensor in the wake state (which does not vary with time),based on the sensor observations obtained until time slot k.Our results show that an optimum closed loop control onMk+1 significantly decreases the cost compared to keeping any number of sensors active all the time. Also, among the three algorithms described, we observe that the total cost is minimum for the optimum control on Mk+1 and is maximum for the optimum open loop control on q.