998 resultados para M-term Approximation
Resumo:
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
We analyze here the occurrence of antiferromagnetic (AFM) correlations in the half-filled Hubbard model in one and two space dimensions using a natural fermionic representation of the model and a newly proposed way of implementing the half-filling constraint. We find that our way of implementing the constraint is capable of enforcing it exactly already at the lowest levels of approximation. We discuss how to develop a systematic adiabatic expansion for the model and how Berry's phase contributions arise quite naturally from the adiabatic expansion. At low temperatures and in the continuum limit the model gets mapped onto an O(3) nonlinear sigma model (NLsigma). A topological, Wess-Zumino term is present in the effective action of the ID NLsigma as expected, while no topological terms are present in 2D. Some specific difficulties that arise in connection with the implementation of an adiabatic expansion scheme within a thermodynamic context are also discussed, and we hint at possible solutions.
Resumo:
Recently, it was found that a reduction in atmospheric CO2 concentration leads to a temporary increase in global precipitation. We use the Hadley Center coupled atmosphere-ocean model, HadCM3L, to demonstrate that this precipitation increase is a consequence of precipitation sensitivity to changes in atmospheric CO2 concentrations through fast tropospheric adjustment processes. Slow ocean cooling explains the longer-term decrease in precipitation. Increased CO2 tends to suppress evaporation/precipitation whereas increased temperatures tend to increase evaporation/precipitation. When the enhanced CO2 forcing is removed, global precipitation increases temporarily, but this increase is not observed when a similar negative radiative forcing is applied as a reduction of solar intensity. Therefore, transient precipitation increase following a reduction in CO2-radiative forcing is a consequence of the specific character of CO2 forcing and is not a general feature associated with decreases in radiative forcing. Citation: Cao, L., G. Bala, and K. Caldeira (2011), Why is there a short-term increase in global precipitation in response to diminished CO2 forcing?, Geophys. Res. Lett., 38, L06703, doi:10.1029/2011GL046713.
Resumo:
Multistress aging/weathering of outdoor composite polymeric insulators has been a topic of interest for power transmission research community in the last few decades. This paper deals with the long-term accelerated weathering of full-scale distribution class silicone rubber composite insulators. To evaluate the long-term synergistic effect of electric stress, temperature and UV radiation on insulators, they were subjected to accelerated weathering in a specially designed multistress-aging chamber for 30,000 h. All the insulators were subjected to the same level of electrical and thermal stresses but different UV radiation levels. Chemical, physical and electrical changes due to degradation have been assessed using various techniques. It was found that there was a monotonous reduction of the content of low molecular weight (LMW) molecules with the duration of the weathering. Further, due to oxidation and weathering there is an appreciable increase in surface roughness and atomic percentage of oxygen. There is no change in the leakage current of new and aged insulators under both wet and dry conditions at the end of the aging. The results also indicate that there is no influence of UV radiation on the silicone rubber for the durations and conditions under which the studies were made.
Resumo:
As part of an international network of large plots to study tropical vegetation dynamics on a long-term basis, a 50-hectare permanent plot was set up during 1988-89 in the deciduous forests of Mudumalai, southern India. Within this plot 25,929 living woody plants (71 species) above 1 cm DBH (diameter at breast height) were identified, measured, tagged and mapped. Species abundances corresponded to the characteristic log-normal distribution. The four most abundant species (Kydia calycina, Lagerstroemia microcarpa, Terminalia crenulata and Helicteres isora) constituted nearly 56% of total stems, while seven species were represented by only one individual each in the plot. Variance/mean ratios of density showed most species to have clumped distributions. The population declined overall by 14% during the first two years, largely due to elephant and fire-mediated damage to Kydia calycina and Helicteres isora. In this article we discuss the need for large plots to study vegetation dynamics.
Resumo:
Nonlinear static and dynamic response analyses of a clamped. rectangular composite plate resting on a two-parameter elastic foundation have been studied using von Karman's relations. Incorporating the material damping, the governing coupled, nonlinear partial differential equations are obtained for the plate under step pressure pulse load excitation. These equations have been solved by a one-term solution and by applying Galerkin's technique to the deflection equation. This yields an ordinary nonlinear differential equation in time. The nonlinear static solution is obtained by neglecting the time-dependent variables. Thc nonlinear dynamic damped response is obtained by applying the ultraspherical polynomial approximation (UPA) technique. The influences of foundation modulus, shear modulus, orthotropy, etc. upon the nonlinear static and dynamic responses have been presented.
Resumo:
A two timescale stochastic approximation scheme which uses coupled iterations is used for simulation-based parametric optimization as an alternative to traditional "infinitesimal perturbation analysis" schemes, It avoids the aggregation of data present in many other schemes. Its convergence is analyzed, and a queueing example is presented.
Resumo:
A two-time scale stochastic approximation algorithm is proposed for simulation-based parametric optimization of hidden Markov models, as an alternative to the traditional approaches to ''infinitesimal perturbation analysis.'' Its convergence is analyzed, and a queueing example is presented.
Resumo:
We propose, for the first time, a reinforcement learning (RL) algorithm with function approximation for traffic signal control. Our algorithm incorporates state-action features and is easily implementable in high-dimensional settings. Prior work, e. g., the work of Abdulhai et al., on the application of RL to traffic signal control requires full-state representations and cannot be implemented, even in moderate-sized road networks, because the computational complexity exponentially grows in the numbers of lanes and junctions. We tackle this problem of the curse of dimensionality by effectively using feature-based state representations that use a broad characterization of the level of congestion as low, medium, or high. One advantage of our algorithm is that, unlike prior work based on RL, it does not require precise information on queue lengths and elapsed times at each lane but instead works with the aforementioned described features. The number of features that our algorithm requires is linear to the number of signaled lanes, thereby leading to several orders of magnitude reduction in the computational complexity. We perform implementations of our algorithm on various settings and show performance comparisons with other algorithms in the literature, including the works of Abdulhai et al. and Cools et al., as well as the fixed-timing and the longest queue algorithms. For comparison, we also develop an RL algorithm that uses full-state representation and incorporates prioritization of traffic, unlike the work of Abdulhai et al. We observe that our algorithm outperforms all the other algorithms on all the road network settings that we consider.
Resumo:
This paper investigates the propagation of a strong shock into an inhomogeneous medium using the new theory of shock dynamics. The equations are simple to solve and involve no trial-and-error method commonly used in this case. The results compare favourably with earlier results obtained in the case of self-similar flows, which arise as a special case of this theory.
Resumo:
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time Scale stochastic approximation. Convergence analysis, approximation issues and an example are studied.
Resumo:
We consider the problem of wireless channel allocation to multiple users. A slot is given to a user with a highest metric (e.g., channel gain) in that slot. The scheduler may not know the channel states of all the users at the beginning of each slot. In this scenario opportunistic splitting is an attractive solution. However this algorithm requires that the metrics of different users form independent, identically distributed (iid) sequences with same distribution and that their distribution and number be known to the scheduler. This limits the usefulness of opportunistic splitting. In this paper we develop a parametric version of this algorithm. The optimal parameters of the algorithm are learnt online through a stochastic approximation scheme. Our algorithm does not require the metrics of different users to have the same distribution. The statistics of these metrics and the number of users can be unknown and also vary with time. Each metric sequence can be Markov. We prove the convergence of the algorithm and show its utility by scheduling the channel to maximize its throughput while satisfying some fairness and/or quality of service constraints.
Resumo:
We consider the problem of scheduling a wireless channel among multiple users. A slot is given to a user with a highest metric (e.g., channel gain) in that slot. The scheduler may not know the channel states of all the users at the beginning of each slot. In this scenario opportunistic splitting is an attractive solution. However this algorithm requires that the metrics of different users form independent, identically distributed (iid) sequences with same distribution and that their distribution and number be known to the scheduler. This limits the usefulness of opportunistic splitting. In this paper we develop a parametric version of this algorithm. The optimal parameters of the algorithm are learnt online through a stochastic approximation scheme. Our algorithm does not require the metrics of different users to have the same distribution. The statistics of these metrics and the number of users can be unknown and also vary with time. We prove the convergence of the algorithm and show its utility by scheduling the channel to maximize its throughput while satisfying some fairness and/or quality of service constraints.