6 resultados para Markov Decision Process

em CaltechTHESIS


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern robots are increasingly expected to function in uncertain and dynamically challenging environments, often in proximity with humans. In addition, wide scale adoption of robots requires on-the-fly adaptability of software for diverse application. These requirements strongly suggest the need to adopt formal representations of high level goals and safety specifications, especially as temporal logic formulas. This approach allows for the use of formal verification techniques for controller synthesis that can give guarantees for safety and performance. Robots operating in unstructured environments also face limited sensing capability. Correctly inferring a robot's progress toward high level goal can be challenging.

This thesis develops new algorithms for synthesizing discrete controllers in partially known environments under specifications represented as linear temporal logic (LTL) formulas. It is inspired by recent developments in finite abstraction techniques for hybrid systems and motion planning problems. The robot and its environment is assumed to have a finite abstraction as a Partially Observable Markov Decision Process (POMDP), which is a powerful model class capable of representing a wide variety of problems. However, synthesizing controllers that satisfy LTL goals over POMDPs is a challenging problem which has received only limited attention.

This thesis proposes tractable, approximate algorithms for the control synthesis problem using Finite State Controllers (FSCs). The use of FSCs to control finite POMDPs allows for the closed system to be analyzed as finite global Markov chain. The thesis explicitly shows how transient and steady state behavior of the global Markov chains can be related to two different criteria with respect to satisfaction of LTL formulas. First, the maximization of the probability of LTL satisfaction is related to an optimization problem over a parametrization of the FSC. Analytic computation of gradients are derived which allows the use of first order optimization techniques.

The second criterion encourages rapid and frequent visits to a restricted set of states over infinite executions. It is formulated as a constrained optimization problem with a discounted long term reward objective by the novel utilization of a fundamental equation for Markov chains - the Poisson equation. A new constrained policy iteration technique is proposed to solve the resulting dynamic program, which also provides a way to escape local maxima.

The algorithms proposed in the thesis are applied to the task planning and execution challenges faced during the DARPA Autonomous Robotic Manipulation - Software challenge.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A general framework for multi-criteria optimal design is presented which is well-suited for automated design of structural systems. A systematic computer-aided optimal design decision process is developed which allows the designer to rapidly evaluate and improve a proposed design by taking into account the major factors of interest related to different aspects such as design, construction, and operation.

The proposed optimal design process requires the selection of the most promising choice of design parameters taken from a large design space, based on an evaluation using specified criteria. The design parameters specify a particular design, and so they relate to member sizes, structural configuration, etc. The evaluation of the design uses performance parameters which may include structural response parameters, risks due to uncertain loads and modeling errors, construction and operating costs, etc. Preference functions are used to implement the design criteria in a "soft" form. These preference functions give a measure of the degree of satisfaction of each design criterion. The overall evaluation measure for a design is built up from the individual measures for each criterion through a preference combination rule. The goal of the optimal design process is to obtain a design that has the highest overall evaluation measure - an optimization problem.

Genetic algorithms are stochastic optimization methods that are based on evolutionary theory. They provide the exploration power necessary to explore high-dimensional search spaces to seek these optimal solutions. Two special genetic algorithms, hGA and vGA, are presented here for continuous and discrete optimization problems, respectively.

The methodology is demonstrated with several examples involving the design of truss and frame systems. These examples are solved by using the proposed hGA and vGA.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Earthquake early warning (EEW) systems have been rapidly developing over the past decade. Japan Meteorological Agency (JMA) has an EEW system that was operating during the 2011 M9 Tohoku earthquake in Japan, and this increased the awareness of EEW systems around the world. While longer-time earthquake prediction still faces many challenges to be practical, the availability of shorter-time EEW opens up a new door for earthquake loss mitigation. After an earthquake fault begins rupturing, an EEW system utilizes the first few seconds of recorded seismic waveform data to quickly predict the hypocenter location, magnitude, origin time and the expected shaking intensity level around the region. This early warning information is broadcast to different sites before the strong shaking arrives. The warning lead time of such a system is short, typically a few seconds to a minute or so, and the information is uncertain. These factors limit human intervention to activate mitigation actions and this must be addressed for engineering applications of EEW. This study applies a Bayesian probabilistic approach along with machine learning techniques and decision theories from economics to improve different aspects of EEW operation, including extending it to engineering applications.

Existing EEW systems are often based on a deterministic approach. Often, they assume that only a single event occurs within a short period of time, which led to many false alarms after the Tohoku earthquake in Japan. This study develops a probability-based EEW algorithm based on an existing deterministic model to extend the EEW system to the case of concurrent events, which are often observed during the aftershock sequence after a large earthquake.

To overcome the challenge of uncertain information and short lead time of EEW, this study also develops an earthquake probability-based automated decision-making (ePAD) framework to make robust decision for EEW mitigation applications. A cost-benefit model that can capture the uncertainties in EEW information and the decision process is used. This approach is called the Performance-Based Earthquake Early Warning, which is based on the PEER Performance-Based Earthquake Engineering method. Use of surrogate models is suggested to improve computational efficiency. Also, new models are proposed to add the influence of lead time into the cost-benefit analysis. For example, a value of information model is used to quantify the potential value of delaying the activation of a mitigation action for a possible reduction of the uncertainty of EEW information in the next update. Two practical examples, evacuation alert and elevator control, are studied to illustrate the ePAD framework. Potential advanced EEW applications, such as the case of multiple-action decisions and the synergy of EEW and structural health monitoring systems, are also discussed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Hamilton Jacobi Bellman (HJB) equation is central to stochastic optimal control (SOC) theory, yielding the optimal solution to general problems specified by known dynamics and a specified cost functional. Given the assumption of quadratic cost on the control input, it is well known that the HJB reduces to a particular partial differential equation (PDE). While powerful, this reduction is not commonly used as the PDE is of second order, is nonlinear, and examples exist where the problem may not have a solution in a classical sense. Furthermore, each state of the system appears as another dimension of the PDE, giving rise to the curse of dimensionality. Since the number of degrees of freedom required to solve the optimal control problem grows exponentially with dimension, the problem becomes intractable for systems with all but modest dimension.

In the last decade researchers have found that under certain, fairly non-restrictive structural assumptions, the HJB may be transformed into a linear PDE, with an interesting analogue in the discretized domain of Markov Decision Processes (MDP). The work presented in this thesis uses the linearity of this particular form of the HJB PDE to push the computational boundaries of stochastic optimal control.

This is done by crafting together previously disjoint lines of research in computation. The first of these is the use of Sum of Squares (SOS) techniques for synthesis of control policies. A candidate polynomial with variable coefficients is proposed as the solution to the stochastic optimal control problem. An SOS relaxation is then taken to the partial differential constraints, leading to a hierarchy of semidefinite relaxations with improving sub-optimality gap. The resulting approximate solutions are shown to be guaranteed over- and under-approximations for the optimal value function. It is shown that these results extend to arbitrary parabolic and elliptic PDEs, yielding a novel method for Uncertainty Quantification (UQ) of systems governed by partial differential constraints. Domain decomposition techniques are also made available, allowing for such problems to be solved via parallelization and low-order polynomials.

The optimization-based SOS technique is then contrasted with the Separated Representation (SR) approach from the applied mathematics community. The technique allows for systems of equations to be solved through a low-rank decomposition that results in algorithms that scale linearly with dimensionality. Its application in stochastic optimal control allows for previously uncomputable problems to be solved quickly, scaling to such complex systems as the Quadcopter and VTOL aircraft. This technique may be combined with the SOS approach, yielding not only a numerical technique, but also an analytical one that allows for entirely new classes of systems to be studied and for stability properties to be guaranteed.

The analysis of the linear HJB is completed by the study of its implications in application. It is shown that the HJB and a popular technique in robotics, the use of navigation functions, sit on opposite ends of a spectrum of optimization problems, upon which tradeoffs may be made in problem complexity. Analytical solutions to the HJB in these settings are available in simplified domains, yielding guidance towards optimality for approximation schemes. Finally, the use of HJB equations in temporal multi-task planning problems is investigated. It is demonstrated that such problems are reducible to a sequence of SOC problems linked via boundary conditions. The linearity of the PDE allows us to pre-compute control policy primitives and then compose them, at essentially zero cost, to satisfy a complex temporal logic specification.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The propagation of waves in an extended, irregular medium is studied under the "quasi-optics" and the "Markov random process" approximations. Under these assumptions, a Fokker-Planck equation satisfied by the characteristic functional of the random wave field is derived. A complete set of the moment equations with different transverse coordinates and different wavenumbers is then obtained from the characteristic functional. The derivation does not require Gaussian statistics of the random medium and the result can be applied to the time-dependent problem. We then solve the moment equations for the phase correlation function, angular broadening, temporal pulse smearing, intensity correlation function, and the probability distribution of the random waves. The necessary and sufficient conditions for strong scintillation are also given.

We also consider the problem of diffraction of waves by a random, phase-changing screen. The intensity correlation function is solved in the whole Fresnel diffraction region and the temporal pulse broadening function is derived rigorously from the wave equation.

The method of smooth perturbations is applied to interplanetary scintillations. We formulate and calculate the effects of the solar-wind velocity fluctuations on the observed intensity power spectrum and on the ratio of the observed "pattern" velocity and the true velocity of the solar wind in the three-dimensional spherical model. The r.m.s. solar-wind velocity fluctuations are found to be ~200 km/sec in the region about 20 solar radii from the Sun.

We then interpret the observed interstellar scintillation data using the theories derived under the Markov approximation, which are also valid for the strong scintillation. We find that the Kolmogorov power-law spectrum with an outer scale of 10 to 100 pc fits the scintillation data and that the ambient averaged electron density in the interstellar medium is about 0.025 cm-3. It is also found that there exists a region of strong electron density fluctuation with thickness ~10 pc and mean electron density ~7 cm-3 between the PSR 0833-45 pulsar and the earth.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

These studies explore how, where, and when representations of variables critical to decision-making are represented in the brain. In order to produce a decision, humans must first determine the relevant stimuli, actions, and possible outcomes before applying an algorithm that will select an action from those available. When choosing amongst alternative stimuli, the framework of value-based decision-making proposes that values are assigned to the stimuli and that these values are then compared in an abstract “value space” in order to produce a decision. Despite much progress, in particular regarding the pinpointing of ventromedial prefrontal cortex (vmPFC) as a region that encodes the value, many basic questions remain. In Chapter 2, I show that distributed BOLD signaling in vmPFC represents the value of stimuli under consideration in a manner that is independent of the type of stimulus it is. Thus the open question of whether value is represented in abstraction, a key tenet of value-based decision-making, is confirmed. However, I also show that stimulus-dependent value representations are also present in the brain during decision-making and suggest a potential neural pathway for stimulus-to-value transformations that integrates these two results.

More broadly speaking, there is both neural and behavioral evidence that two distinct control systems are at work during action selection. These two systems compose the “goal-directed system”, which selects actions based on an internal model of the environment, and the “habitual” system, which generates responses based on antecedent stimuli only. Computational characterizations of these two systems imply that they have different informational requirements in terms of input stimuli, actions, and possible outcomes. Associative learning theory predicts that the habitual system should utilize stimulus and action information only, while goal-directed behavior requires that outcomes as well as stimuli and actions be processed. In Chapter 3, I test whether areas of the brain hypothesized to be involved in habitual versus goal-directed control represent the corresponding theorized variables.

The question of whether one or both of these neural systems drives Pavlovian conditioning is less well-studied. Chapter 4 describes an experiment in which subjects were scanned while engaged in a Pavlovian task with a simple non-trivial structure. After comparing a variety of model-based and model-free learning algorithms (thought to underpin goal-directed and habitual decision-making, respectively), it was found that subjects’ reaction times were better explained by a model-based system. In addition, neural signaling of precision, a variable based on a representation of a world model, was found in the amygdala. These data indicate that the influence of model-based representations of the environment can extend even to the most basic learning processes.

Knowledge of the state of hidden variables in an environment is required for optimal inference regarding the abstract decision structure of a given environment and therefore can be crucial to decision-making in a wide range of situations. Inferring the state of an abstract variable requires the generation and manipulation of an internal representation of beliefs over the values of the hidden variable. In Chapter 5, I describe behavioral and neural results regarding the learning strategies employed by human subjects in a hierarchical state-estimation task. In particular, a comprehensive model fit and comparison process pointed to the use of "belief thresholding". This implies that subjects tended to eliminate low-probability hypotheses regarding the state of the environment from their internal model and ceased to update the corresponding variables. Thus, in concert with incremental Bayesian learning, humans explicitly manipulate their internal model of the generative process during hierarchical inference consistent with a serial hypothesis testing strategy.