23 resultados para Cost Control
em Indian Institute of Science - Bangalore - Índia
Resumo:
The ergodic or long-run average cost control problem for a partially observed finite-state Markov chain is studied via the associated fully observed separated control problem for the nonlinear filter. Dynamic programming equations for the latter are derived, leading to existence and characterization of optimal stationary policies.
Resumo:
This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation.
Resumo:
We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.
Resumo:
We address the optimal control problem of a very general stochastic hybrid system with both autonomous and impulsive jumps. The planning horizon is infinite and we use the discounted-cost criterion for performance evaluation. Under certain assumptions, we show the existence of an optimal control. We then derive the quasivariational inequalities satisfied by the value function and establish well-posedness. Finally, we prove the usual verification theorem of dynamic programming.
Resumo:
The existence of an optimal feedback law is established for the risk-sensitive optimal control problem with denumerable state space. The main assumptions imposed are irreducibility and a near monotonicity condition on the one-step cost function. A solution can be found constructively using either value iteration or policy iteration under suitable conditions on initial feedback law.
Resumo:
We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.
Resumo:
As petrol prices are going up in developing countries in upcoming decades low cost electric cars will become more and more popular in developing world. One of the main deciding factors for success of electric cars specially in developing world in upcoming decades will be its cost. This paper shows a cost effective method to control the speed of low cost brushed D.C. motor by combining a IC 555 Timer with a High Boost Converter. The main purpose of using High Boost Converter since electric cars needs high voltage and current which a High Boost Converter can provide even with low battery supply.
Resumo:
The phenomena of nonlinear I-V behavior and electrical switching find extensive applications in power control, information storage, oscillators, etc. The study of I-V characteristics and switching parameters is necessary for the proper application of switching materials and devices. In the present work, a simple low-cost electrical switching analyzer has been developed for the measurement of the electrical characteristics of switching materials and devices. The system developed consists of a microcontroller-based excitation source and a high-speed data acquisition system. The design details of the excitation source, its interface with the high-speed data acquisition system and personal computer, and the details of the application software developed for automated measurements are described. Typical I-V characteristics and switching curves obtained with the system developed are also presented to illustrate the capability of the instrument developed.
Resumo:
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
We consider discrete-time versions of two classical problems in the optimal control of admission to a queueing system: i) optimal routing of arrivals to two parallel queues and ii) optimal acceptance/rejection of arrivals to a single queue. We extend the formulation of these problems to permit a k step delay in the observation of the queue lengths by the controller. For geometric inter-arrival times and geometric service times the problems are formulated as controlled Markov chains with expected total discounted cost as the minimization objective. For problem i) we show that when k = 1, the optimal policy is to allocate an arrival to the queue with the smaller expected queue length (JSEQ: Join the Shortest Expected Queue). We also show that for this problem, for k greater than or equal to 2, JSEQ is not optimal. For problem ii) we show that when k = 1, the optimal policy is a threshold policy. There are, however, two thresholds m(0) greater than or equal to m(1) > 0, such that mo is used when the previous action was to reject, and mi is used when the previous action was to accept.
Resumo:
FACTS controllers are emerging as viable and economic solutions to the problems of large interconnected ne networks, which can endanger the system security. These devices are characterized by their fast response, absence of inertia, and minimum maintenance requirements. Thyristor controlled equipment like Thyristor Controlled Series Capacitor (TCSC), Static Var Compensator (SVC), Thyristor Controlled Phase angle Regulator (TCPR) etc. which involve passive elements result in devices of large sizes with substantial cost and significant labour for installation. An all solid-state device using GTOs leads to reduction in equipment size and has improved performance. The Unified Power Flow Controller (UPFC) is a versatile controller which can be used to control the active and reactive power in the Line independently. The concept of UPFC makes it possible to handle practically all power flow control and transmission line compensation problems, using solid-state controllers, which provide functional flexibility, generally not attainable by conventional thyristor controlled systems. In this paper, we present the development of a control scheme for the series injected voltage of the UPFC to damp the power oscillations and improve transient stability in a power system. (C) 1998 Elsevier Science Ltd. All rights reserved.
Resumo:
An optimal control law for a general nonlinear system can be obtained by solving Hamilton-Jacobi-Bellman equation. However, it is difficult to obtain an analytical solution of this equation even for a moderately complex system. In this paper, we propose a continuoustime single network adaptive critic scheme for nonlinear control affine systems where the optimal cost-to-go function is approximated using a parametric positive semi-definite function. Unlike earlier approaches, a continuous-time weight update law is derived from the HJB equation. The stability of the system is analysed during the evolution of weights using Lyapunov theory. The effectiveness of the scheme is demonstrated through simulation examples.
Resumo:
We study optimal control of Markov processes with age-dependent transition rates. The control policy is chosen continuously over time based on the state of the process and its age. We study infinite horizon discounted cost and infinite horizon average cost problems. Our approach is via the construction of an equivalent semi-Markov decision process. We characterise the value function and optimal controls for both discounted and average cost cases.
Resumo:
In this paper, we study the asymptotic behavior of an optimal control problem for the time-dependent Kirchhoff-Love plate whose middle surface has a very rough boundary. We identify the limit problem which is an optimal control problem for the limit equation with a different cost functional.
Resumo:
Wind power, as an alternative to fossil fuels, is plentiful, renewable, widely distributed, clean, produces no greenhouse gas emissions during operation, and uses little land. In operation, the overall cost per unit of energy produced is similar to the cost for new coal and natural gas installations. However, the stochastic behaviour of wind speeds leads to significant disharmony between wind energy production and electricity demand. Wind generation suffers from an intermittent characteristics due to the own diurnal and seasonal patterns of the wind behaviour. Both reactive power and voltage control are important under varying operating conditions of wind farm. To optimize reactive power flow and to keep voltages in limit, an optimization method is proposed in this paper. The objective proposed is minimization of the voltage deviations of the load buses (Vdesired). The approach considers the reactive power limits of wind generators and co-ordinates the transformer taps. This algorithm has been tested under practically varying conditions simulated on a test system. The results are obtained on a system of 50-bus real life equivalent power network. The result shows the efficiency of the proposed method.