178 resultados para STOCHASTIC OPTIMAL CONTROL
Resumo:
In this paper, we give a generalization of a result by Borkar and Meyn (2000) 1], on the stability and convergence of synchronous-update stochastic approximation algorithms, to the case of asynchronous stochastic approximations with delays. We then describe an interesting application of the result to asynchronous distributed temporal difference (TD) learning with function approximation and delays. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
The specified range of free chlorine residual (between minimum and maximum) in water distribution systems needs to be maintained to avoid deterioration of the microbial quality of water, control taste and/or odor problems, and hinder formation of carcino-genic disinfection by-products. Multiple water quality sources for providing chlorine input are needed to maintain the chlorine residuals within a specified range throughout the distribution system. The determination of source dosage (i.e., chlorine concentrations/chlorine mass rates) at water quality sources to satisfy the above objective under dynamic conditions is a complex process. A nonlinear optimization problem is formulated to determine the chlorine dosage at the water quality sources subjected to minimum and maximum constraints on chlorine concentrations at all monitoring nodes. A genetic algorithm (GA) approach in which decision variables (chlorine dosage) are coded as binary strings is used to solve this highly nonlinear optimization problem, with nonlinearities arising due to set-point sources and non-first-order reactions. Application of the model is illustrated using three sample water distribution systems, and it indicates that the GA,is a useful tool for evaluating optimal water quality source chlorine schedules.
Resumo:
For a homing interceptor, suitable initial condition must be achieved by mid course guidance scheme for its maximum effectiveness. To achieve desired end goal of any mid course guidance scheme, two point boundary value problem must be solved online with all realistic constrain. A Newly developed computationally efficient technique named as MPSP (Model Predictive Static Programming) is utilized in this paper for obtaining suboptimal solution of optimal mid course guidance. Time to go uncertainty is avoided in this formulation by making use of desired position where midcourse guidance terminate and terminal guidance takes over. A suitable approach angle towards desired point also can be specified in this guidance law formulation. This feature makes this law particularly attractive because warhead effectiveness issue can be indirectly solved in mid course phase.
Resumo:
We consider a joint power control and transmission scheduling problem in wireless networks with average power constraints. While the capacity region of a wireless network is convex, a characterization of this region is a hard problem. We formulate a network utility optimization problem involving time-sharing across different "transmission modes," where each mode corresponds to the set of power levels used in the network. The structure of the optimal solution is a time-sharing across a small set of such modes. We use this structure to develop an efficient heuristic approach to finding a suboptimal solution through column generation iterations. This heuristic approach converges quite fast in simulations, and provides a tool for wireless network planning.
Resumo:
Wireless networks transmit information from a source to a destination via multiple hops in order to save energy and, thus, increase the lifetime of battery-operated nodes. The energy savings can be especially significant in cooperative transmission schemes, where several nodes cooperate during one hop to forward the information to the next node along a route to the destination. Finding the best multi-hop transmission policy in such a network which determines nodes that are involved in each hop, is a very important problem, but also a very difficult one especially when the physical wireless channel behavior is to be accounted for and exploited. We model the above optimization problem for randomly fading channels as a decentralized control problem – the channel observations available at each node define the information structure, while the control policy is defined by the power and phase of the signal transmitted by each node.In particular, we consider the problem of computing an energy-optimal cooperative transmission scheme in a wireless network for two different channel fading models: (i) slow fading channels, where the channel gains of the links remain the same for a large number of transmissions, and (ii) fast fading channels,where the channel gains of the links change quickly from one transmission to another. For slow fading, we consider a factored class of policies (corresponding to local cooperation between nodes), and show that the computation of an optimal policy in this class is equivalent to a shortest path computation on an induced graph, whose edge costs can be computed in a decentralized manner using only locally available channel state information(CSI). For fast fading, both CSI acquisition and data transmission consume energy. Hence, we need to jointly optimize over both these; we cast this optimization problem as a large stochastic optimization problem. We then jointly optimize over a set of CSI functions of the local channel states, and a corresponding factored class of control policies corresponding to local cooperation between nodes with a local outage constraint. The resulting optimal scheme in this class can again be computed efficiently in a decentralized manner. We demonstrate significant energy savings for both slow and fast fading channels through numerical simulations of randomly distributed networks.
Resumo:
Pricing is an effective tool to control congestion and achieve quality of service (QoS) provisioning for multiple differentiated levels of service. In this paper, we consider the problem of pricing for congestion control in the case of a network of nodes under a single service class and multiple queues, and present a multi-layered pricing scheme. We propose an algorithm for finding the optimal state dependent price levels for individual queues, at each node. The pricing policy used depends on a weighted average queue length at each node. This helps in reducing frequent price variations and is in the spirit of the random early detection (RED) mechanism used in TCP/IP networks. We observe in our numerical results a considerable improvement in performance using our scheme over that of a recently proposed related scheme in terms of both throughput and delay performance. In particular, our approach exhibits a throughput improvement in the range of 34 to 69 percent in all cases studied (over all routes) over the above scheme.
Active Vibration Suppression of One-dimensional Nonlinear Structures Using Optimal Dynamic Inversion
Resumo:
A flexible robot arm can be modeled as an Euler-Bernoulli beam which are infinite degrees of freedom (DOF) system. Proper control is needed to track the desired motion of a robotic arm. The infinite number of DOF of beams are reduced to finite number for controller implementation, which brings in error (due to their distributed nature). Therefore, to represent reality better distributed parameter systems (DPS) should be controlled using the systems partial differential equation (PDE) directly. In this paper, we propose to use a recently developed optimal dynamic inversion technique to design a controller to suppress nonlinear vibration of a beam. The method used in this paper determines control forces directly from the PDE model of the system. The formulation has better practical significance, because it leads to a closed form solution of the controller (hence avoids computational issues).
Resumo:
We study the problem of optimal bandwidth allocation in communication networks. We consider a queueing model with two queues to which traffic from different competing flows arrive. The queue length at the buffers is observed every T instants of time, on the basis of which a decision on the amount of bandwidth to be allocated to each buffer for the next T instants is made. We consider a class of closed-loop feedback policies for the system and use a twotimescale simultaneous perturbation stochastic approximation(SPSA) algorithm to find an optimal policy within the prescribed class. We study the performance of the proposed algorithm on a numerical setting. Our algorithm is found to exhibit good performance.
Resumo:
The problem of finding optimal parameterized feedback policies for dynamic bandwidth allocation in communication networks is studied. We consider a queueing model with two queues to which traffic from different competing flows arrive. The queue length at the buffers is observed every T instants of time, on the basis of which a decision on the amount of bandwidth to be allocated to each buffer for the next T instants is made. We consider two different classes of multilevel closed-loop feedback policies for the system and use a two-timescale simultaneous perturbation stochastic approximation (SPSA) algorithm to find optimal policies within each prescribed class. We study the performance of the proposed algorithm on a numerical setting and show performance comparisons of the two optimal multilevel closedloop policies with optimal open loop policies. We observe that closed loop policies of Class B that tune parameters for both the queues and do not have the constraint that the entire bandwidth be used at each instant exhibit the best results overall as they offer greater flexibility in parameter tuning. Index Terms — Resource allocation, dynamic bandwidth allocation in communication networks, two-timescale SPSA algorithm, optimal parameterized policies. I.
Resumo:
In a dense multi-hop network of mobile nodes capable of applying adaptive power control, we consider the problem of finding the optimal hop distance that maximizes a certain throughput measure in bit-metres/sec, subject to average network power constraints. The mobility of nodes is restricted to a circular periphery area centered at the nominal location of nodes. We incorporate only randomly varying path-loss characteristics of channel gain due to the random motion of nodes, excluding any multi-path fading or shadowing effects. Computation of the throughput metric in such a scenario leads us to compute the probability density function of random distance between points in two circles. Using numerical analysis we discover that choosing the nearest node as next hop is not always optimal. Optimal throughput performance is also attained at non-trivial hop distances depending on the available average network power.
Resumo:
To investigate the use of centre of gravity location on reducing cyclic pitch control for helicopter UAV's (unmanned air vehicles) and MAV's (micro air vehicles). Low cyclic pitch is a necessity to implement the swashplateless rotor concept using trailing edge flaps or active twist using current generation low authority piezoceramic actuators. Design/methodology/approach – An aeroelastic analysis of the helicopter rotor with elastic blades is used to perform parametric and sensitivity studies of the effects of longitudinal and lateral center of gravity (cg) movements on the main rotor cyclic pitch. An optimization approach is then used to find cg locations which reduce the cyclic pitch at a given forward speed. Findings – It is found that the longitudinal cyclic pitch and lateral cyclic pitch can be driven to zero at a given forward speed by shifting the cg forward and to the port side, respectively. There also exist pairs of numbers for the longitudinal and lateral cg locations which drive both the cyclic pitch components to zero at a given forward speed. Based on these results, a compromise optimal cg location is obtained such that the cyclic pitch is bounded within ±5° for a BO105 helicopter rotor. Originality/value – The reduction in the cyclic pitch due to helicopter cg location is found to significantly reduce the maximum magnitudes of the control angles in flight, facilitating the swashplateless rotor concept. In addition, the existence of cg locations which drive the cyclic pitches to zero allows for the use of active cg movement as a way to replace the cyclic pitch control for helicopter MAV's.
Resumo:
We propose for the first time two reinforcement learning algorithms with function approximation for average cost adaptive control of traffic lights. One of these algorithms is a version of Q-learning with function approximation while the other is a policy gradient actor-critic algorithm that incorporates multi-timescale stochastic approximation. We show performance comparisons on various network settings of these algorithms with a range of fixed timing algorithms, as well as a Q-learning algorithm with full state representation that we also implement. We observe that whereas (as expected) on a two-junction corridor, the full state representation algorithm shows the best results, this algorithm is not implementable on larger road networks. The algorithm PG-AC-TLC that we propose is seen to show the best overall performance.
Resumo:
A torque control scheme, based on a direct torque control (DTC) algorithm using a 12-sided polygonal voltage space vector, is proposed for a variable speed control of an open-end induction motor drive. The conventional DTC scheme uses a stator flux vector for the sector identification and then the switching vector to control stator flux and torque. However, the proposed DTC scheme selects switching vectors based on the sector information of the estimated fundamental stator voltage vector and its relative position with respect to the stator flux vector. The fundamental stator voltage estimation is based on the steady-state model of IM and the synchronous frequency of operation is derived from the computed stator flux using a low-pass filter technique. The proposed DTC scheme utilizes the exact positions of the fundamental stator voltage vector and stator flux vector to select the optimal switching vector for fast control of torque with small variation of stator flux within the hysteresis band. The present DTC scheme allows full load torque control with fast transient response to very low speeds of operation, with reduced switching frequency variation. Extensive experimental results are presented to show the fast torque control for speed of operation from zero to rated.
Resumo:
Stochastic hybrid systems arise in numerous applications of systems with multiple models; e.g., air traffc management, flexible manufacturing systems, fault tolerant control systems etc. In a typical hybrid system, the state space is hybrid in the sense that some components take values in a Euclidean space, while some other components are discrete. In this paper we propose two stochastic hybrid models, both of which permit diffusion and hybrid jump. Such models are essential for studying air traffic management in a stochastic framework.
Resumo:
The throughput-optimal discrete-rate adaptation policy, when nodes are subject to constraints on the average power and bit error rate, is governed by a power control parameter, for which a closed-form characterization has remained an open problem. The parameter is essential in determining the rate adaptation thresholds and the transmit rate and power at any time, and ensuring adherence to the power constraint. We derive novel insightful bounds and approximations that characterize the power control parameter and the throughput in closed-form. The results are comprehensive as they apply to the general class of Nakagami-m (m >= 1) fading channels, which includes Rayleigh fading, uncoded and coded modulation, and single and multi-node systems with selection. The results are appealing as they are provably tight in the asymptotic large average power regime, and are designed and verified to be accurate even for smaller average powers.