919 resultados para Optimal Stochastic Control
Resumo:
We present two efficient discrete parameter simulation optimization (DPSO) algorithms for the long-run average cost objective. One of these algorithms uses the smoothed functional approximation (SFA) procedure, while the other is based on simultaneous perturbation stochastic approximation (SPSA). The use of SFA for DPSO had not been proposed previously in the literature. Further, both algorithms adopt an interesting technique of random projections that we present here for the first time. We give a proof of convergence of our algorithms. Next, we present detailed numerical experiments on a problem of admission control with dependent service times. We consider two different settings involving parameter sets that have moderate and large sizes, respectively. On the first setting, we also show performance comparisons with the well-studied optimal computing budget allocation (OCBA) algorithm and also the equal allocation algorithm. Note to Practitioners-Even though SPSA and SFA have been devised in the literature for continuous optimization problems, our results indicate that they can be powerful techniques even when they are adapted to discrete optimization settings. OCBA is widely recognized as one of the most powerful methods for discrete optimization when the parameter sets are of small or moderate size. On a setting involving a parameter set of size 100, we observe that when the computing budget is small, both SPSA and OCBA show similar performance and are better in comparison to SFA, however, as the computing budget is increased, SPSA and SFA show better performance than OCBA. Both our algorithms also show good performance when the parameter set has a size of 10(8). SFA is seen to show the best overall performance. Unlike most other DPSO algorithms in the literature, an advantage with our algorithms is that they are easily implementable regardless of the size of the parameter sets and show good performance in both scenarios.
Resumo:
We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov decision processes with finite state and action spaces, with a discounted reward criterion. The algorithm is of the gradient ascent type and performs a search in the space of stationary randomized policies. The algorithm uses certain simultaneous deterministic perturbation stochastic approximation (SDPSA) gradient estimates for enhanced performance. We show an application of our algorithm on a problem of mortgage refinancing. Our algorithm obtains the optimal refinancing strategies in a computationally efficient manner
Resumo:
In this paper, several known computational solutions are readily obtained in a very natural way for the linear regulator, fixed end-point and servo-mechanism problems using a certain frame-work from scattering theory. The relationships between the solutions to the linear regulator problem with different terminal costs and the interplay between the forward and backward equations have enabled a concise derivation of the partitioned equations, the forward-backward equations, and Chandrasekhar equations for the problem. These methods have been extended to the fixed end-point, servo, and tracking problems.
Resumo:
Optimal control laws are obtained for the elevator and the ailerons for a modern fighter aircraft in a rolling pullout maneuver. The problem is solved for three flight conditions using the conjugate gradient method.
Resumo:
We study optimal control of Markov processes with age-dependent transition rates. The control policy is chosen continuously over time based on the state of the process and its age. We study infinite horizon discounted cost and infinite horizon average cost problems. Our approach is via the construction of an equivalent semi-Markov decision process. We characterise the value function and optimal controls for both discounted and average cost cases.
Resumo:
We consider a dense, ad hoc wireless network, confined to a small region. The wireless network is operated as a single cell, i.e., only one successful transmission is supported at a time. Data packets are sent between source-destination pairs by multihop relaying. We assume that nodes self-organize into a multihop network such that all hops are of length d meters, where d is a design parameter. There is a contention-based multiaccess scheme, and it is assumed that every node always has data to send, either originated from it or a transit packet (saturation assumption). In this scenario, we seek to maximize a measure of the transport capacity of the network (measured in bit-meters per second) over power controls (in a fading environment) and over the hop distance d, subject to an average power constraint. We first motivate that for a dense collection of nodes confined to a small region, single cell operation is efficient for single user decoding transceivers. Then, operating the dense ad hoc wireless network (described above) as a single cell, we study the hop length and power control that maximizes the transport capacity for a given network power constraint. More specifically, for a fading channel and for a fixed transmission time strategy (akin to the IEEE 802.11 TXOP), we find that there exists an intrinsic aggregate bit rate (Theta(opt) bits per second, depending on the contention mechanism and the channel fading characteristics) carried by the network, when operating at the optimal hop length and power control. The optimal transport capacity is of the form d(opt)((P) over bar (t)) x Theta(opt) with d(opt) scaling as (P) over bar (t) (1/eta), where (P) over bar (t) is the available time average transmit power and eta is the path loss exponent. Under certain conditions on the fading distribution, we then provide a simple characterization of the optimal operating point. Simulation results are provided comparing the performance of the optimal strategy derived here with some simple strategies for operating the network.
Resumo:
In this paper, we study the asymptotic behavior of an optimal control problem for the time-dependent Kirchhoff-Love plate whose middle surface has a very rough boundary. We identify the limit problem which is an optimal control problem for the limit equation with a different cost functional.
Resumo:
Information spreading in a population can be modeled as an epidemic. Campaigners (e.g., election campaign managers, companies marketing products or movies) are interested in spreading a message by a given deadline, using limited resources. In this paper, we formulate the above situation as an optimal control problem and the solution (using Pontryagin's Maximum Principle) prescribes an optimal resource allocation over the time of the campaign. We consider two different scenarios-in the first, the campaigner can adjust a direct control (over time) which allows her to recruit individuals from the population (at some cost) to act as spreaders for the Susceptible-Infected-Susceptible (SIS) epidemic model. In the second case, we allow the campaigner to adjust the effective spreading rate by incentivizing the infected in the Susceptible-Infected-Recovered (SIR) model, in addition to the direct recruitment. We consider time varying information spreading rate in our formulation to model the changing interest level of individuals in the campaign, as the deadline is reached. In both the cases, we show the existence of a solution and its uniqueness for sufficiently small campaign deadlines. For the fixed spreading rate, we show the effectiveness of the optimal control strategy against the constant control strategy, a heuristic control strategy and no control. We show the sensitivity of the optimal control to the spreading rate profile when it is time varying. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
We model the spread of information in a homogeneously mixed population using the Maki Thompson rumor model. We formulate an optimal control problem, from the perspective of single campaigner, to maximize the spread of information when the campaign budget is fixed. Control signals, such as advertising in the mass media, attempt to convert ignorants and stiflers into spreaders. We show the existence of a solution to the optimal control problem when the campaigning incurs non-linear costs under the isoperimetric budget constraint. The solution employs Pontryagin's Minimum Principle and a modified version of forward backward sweep technique for numerical computation to accommodate the isoperimetric budget constraint. The techniques developed in this paper are general and can be applied to similar optimal control problems in other areas. We have allowed the spreading rate of the information epidemic to vary over the campaign duration to model practical situations when the interest level of the population in the subject of the campaign changes with time. The shape of the optimal control signal is studied for different model parameters and spreading rate profiles. We have also studied the variation of the optimal campaigning costs with respect to various model parameters. Results indicate that, for some model parameters, significant improvements can be achieved by the optimal strategy compared to the static control strategy. The static strategy respects the same budget constraint as the optimal strategy and has a constant value throughout the campaign horizon. This work finds application in election and social awareness campaigns, product advertising, movie promotion and crowdfunding campaigns. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
Adapting the power of secondary users (SUs) while adhering to constraints on the interference caused to primary receivers (PRxs) is a critical issue in underlay cognitive radio (CR). This adaptation is driven by the interference and transmit power constraints imposed on the secondary transmitter (STx). Its performance also depends on the quality of channel state information (CSI) available at the STx of the links from the STx to the secondary receiver and to the PRxs. For a system in which an STx is subject to an average interference constraint or an interference outage probability constraint at each of the PRxs, we derive novel symbol error probability (SEP)-optimal, practically motivated binary transmit power control policies. As a reference, we also present the corresponding SEP-optimal continuous transmit power control policies for one PRx. We then analyze the robustness of the optimal policies when the STx knows noisy channel estimates of the links between the SU and the PRxs. Altogether, our work develops a holistic understanding of the critical role played by different transmit and interference constraints in driving power control in underlay CR and the impact of CSI on its performance.
Resumo:
In this paper, a C-0 interior penalty method has been proposed and analyzed for distributed optimal control problems governed by the biharmonic operator. The state and adjoint variables are discretized using continuous piecewise quadratic finite elements while the control variable is discretized using piecewise constant approximations. A priori and a posteriori error estimates are derived for the state, adjoint and control variables under minimal regularity assumptions. Numerical results justify the theoretical results obtained. The a posteriori error estimators are useful in adaptive finite element approximation and the numerical results indicate that the sharp error estimators work efficiently in guiding the mesh refinement. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
The recently developed reference-command tracking version of model predictive static programming (MPSP) is successfully applied to a single-stage closed grinding mill circuit. MPSP is an innovative optimal control technique that combines the philosophies of model predictive control (MPC) and approximate dynamic programming. The performance of the proposed MPSP control technique, which can be viewed as a `new paradigm' under the nonlinear MPC philosophy, is compared to the performance of a standard nonlinear MPC technique applied to the same plant for the same conditions. Results show that the MPSP control technique is more than capable of tracking the desired set-point in the presence of model-plant mismatch, disturbances and measurement noise. The performance of MPSP and nonlinear MPC compare very well, with definite advantages offered by MPSP. The computational speed of MPSP is increased through a sequence of innovations such as the conversion of the dynamic optimization problem to a low-dimensional static optimization problem, the recursive computation of sensitivity matrices and using a closed form expression to update the control. To alleviate the burden on the optimization procedure in standard MPC, the control horizon is normally restricted. However, in the MPSP technique the control horizon is extended to the prediction horizon with a minor increase in the computational time. Furthermore, the MPSP technique generally takes only a couple of iterations to converge, even when input constraints are applied. Therefore, MPSP can be regarded as a potential candidate for online applications of the nonlinear MPC philosophy to real-world industrial process plants. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
Understanding the growth behavior of microorganisms using modeling and optimization techniques is an active area of research in the fields of biochemical engineering and systems biology. In this paper, we propose a general modeling framework, based on Monad model, to model the growth of microorganisms. Utilizing the general framework, we formulate an optimal control problem with the objective of maximizing a long-term cellular goal and solve it analytically under various constraints for the growth of microorganisms in a two substrate batch environment. We investigate the relation between long term and short term cellular goals and show that the objective of maximizing cellular concentration at a fixed final time is equivalent to maximization of instantaneous growth rate. We then establish the mathematical connection between the generalized framework and optimal and cybernetic modeling frameworks and derive generalized governing dynamic equations for optimal and cybernetic models. We finally illustrate the influence of various constraints in the cybernetic modeling framework on the optimal growth behavior of microorganisms by solving several dynamic optimization problems using genetic algorithms. (C) 2014 Published by Elsevier Inc.
Resumo:
A neural-network-aided nonlinear dynamic inversion-based hybrid technique of model reference adaptive control flight-control system design is presented in this paper. Here, the gains of the nonlinear dynamic inversion-based flight-control system are dynamically selected in such a manner that the resulting controller mimics a single network, adaptive control, optimal nonlinear controller for state regulation. Traditional model reference adaptive control methods use a linearized reference model, and the presented control design method employs a nonlinear reference model to compute the nonlinear dynamic inversion gains. This innovation of designing the gain elements after synthesizing the single network adaptive controller maintains the advantages that an optimal controller offers, yet it retains a simple closed-form control expression in state feedback form, which can easily be modified for tracking problems without demanding any a priori knowledge of the reference signals. The strength of the technique is demonstrated by considering the longitudinal motion of a nonlinear aircraft system. An extended single network adaptive control/nonlinear dynamic inversion adaptive control design architecture is also presented, which adapts online to three failure conditions, namely, a thrust failure, an elevator failure, and an inaccuracy in the estimation of C-M alpha. Simulation results demonstrate that the presented adaptive flight controller generates a near-optimal response when compared to a traditional nonlinear dynamic inversion controller.
Resumo:
The aim in this paper is to allocate the `sleep time' of the individual sensors in an intrusion detection application so that the energy consumption from the sensors is reduced, while keeping the tracking error to a minimum. We propose two novel reinforcement learning (RL) based algorithms that attempt to minimize a certain long-run average cost objective. Both our algorithms incorporate feature-based representations to handle the curse of dimensionality associated with the underlying partially-observable Markov decision process (POMDP). Further, the feature selection scheme used in our algorithms intelligently manages the energy cost and tracking cost factors, which in turn assists the search for the optimal sleeping policy. We also extend these algorithms to a setting where the intruder's mobility model is not known by incorporating a stochastic iterative scheme for estimating the mobility model. The simulation results on a synthetic 2-d network setting are encouraging.