84 resultados para Approximation en probabilité
Resumo:
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Birch reduction of 8,9-didehydroestradiol-17 beta 3-methyl ether 1 or 9(11)-didehydroestradiol-17 beta 3-methyl ether 2 followed by acid hydrolysis results in a mixture of 19-nortestosterone 8 and 19-nor-9 beta, 10 alpha-testosterone 9 in varying amounts. However, reduction of their acetates with sodium or lithium, tert-butyl alcohol in liquid ammonia and in the presence of aniline affords exclusively 19-nortestosterone. Similarly, 18a-homo-19-nortestosterone 12 is prepared from the acetate of 18a-homoestradiol-17 beta 3-methyl ether, 10.
Resumo:
Generation of the thermodynamic dienolate of 9-bromocarvone derivatives 5, 7 and 11 furnished the chiral bicycle[2.2.2] octenones 6, 8 and 9 and 12 and 13 containing a bridgehead methyl group via an intramolecular alkylation reaction. In an analogous manner intramolecular alkylation reaction of the bromo enones 15a-e, obtained from carvone 2 by 1,3-alkylative enone transposition (-->14) followed by a regiospecific bromoetherification reaction, furnished the bicyclo[2.2.2]oct-5-en-2-ones 16a-e and 17a-e.
Resumo:
Acid-catalysed thermal rearrangement of 4-aryl-4-methylhex-5-en-2-ones (products of the Claisen rearrangement of beta-methylcinnamyl alcohols and 2-methoxypropene) to isomeric 5-aryl-4-methylhex-5-en-2-ones via an intramolecular ene reaction of the enol tautomer followed by a retro ene reaction of the resultant acetylcyclopropane is described. Formation of the known diketone 13 via the ozonolysis of the rearrangement product 10, confirmed the structures of the rearranged enones, whereas formation of the enone 15 containing an extra methyl group on the styrene double bond confirmed the proposed mechanism. Finally, the rearrangement has been extended to the formal synthesis of beta-cuparenone 20 via the enones 22 and 23.
Resumo:
A two timescale stochastic approximation scheme which uses coupled iterations is used for simulation-based parametric optimization as an alternative to traditional "infinitesimal perturbation analysis" schemes, It avoids the aggregation of data present in many other schemes. Its convergence is analyzed, and a queueing example is presented.
Resumo:
A two-time scale stochastic approximation algorithm is proposed for simulation-based parametric optimization of hidden Markov models, as an alternative to the traditional approaches to ''infinitesimal perturbation analysis.'' Its convergence is analyzed, and a queueing example is presented.
Resumo:
We propose, for the first time, a reinforcement learning (RL) algorithm with function approximation for traffic signal control. Our algorithm incorporates state-action features and is easily implementable in high-dimensional settings. Prior work, e. g., the work of Abdulhai et al., on the application of RL to traffic signal control requires full-state representations and cannot be implemented, even in moderate-sized road networks, because the computational complexity exponentially grows in the numbers of lanes and junctions. We tackle this problem of the curse of dimensionality by effectively using feature-based state representations that use a broad characterization of the level of congestion as low, medium, or high. One advantage of our algorithm is that, unlike prior work based on RL, it does not require precise information on queue lengths and elapsed times at each lane but instead works with the aforementioned described features. The number of features that our algorithm requires is linear to the number of signaled lanes, thereby leading to several orders of magnitude reduction in the computational complexity. We perform implementations of our algorithm on various settings and show performance comparisons with other algorithms in the literature, including the works of Abdulhai et al. and Cools et al., as well as the fixed-timing and the longest queue algorithms. For comparison, we also develop an RL algorithm that uses full-state representation and incorporates prioritization of traffic, unlike the work of Abdulhai et al. We observe that our algorithm outperforms all the other algorithms on all the road network settings that we consider.
Resumo:
This paper investigates the propagation of a strong shock into an inhomogeneous medium using the new theory of shock dynamics. The equations are simple to solve and involve no trial-and-error method commonly used in this case. The results compare favourably with earlier results obtained in the case of self-similar flows, which arise as a special case of this theory.
Resumo:
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time Scale stochastic approximation. Convergence analysis, approximation issues and an example are studied.
Resumo:
A methodology based on Claisen rearrangement-Wacker oxidation and intramolecular aldol condensation strategy starting from cyclic ketones leading to spiro[4.n](n+5)alk-2-en-1-ones has been developed. Thus one-pot Claisen rearrangement of the alkyl alcohols 6a-c furnished the aldehydes 8a-c, which on regiospecific oxidation using Wacker conditions generated the keto-aldehydes 9a-c. Finally, intramolecular aldol condensation transformed the keto-aldehydes 9a-c into spiroannulated products 10a-c.
Resumo:
We consider the problem of wireless channel allocation to multiple users. A slot is given to a user with a highest metric (e.g., channel gain) in that slot. The scheduler may not know the channel states of all the users at the beginning of each slot. In this scenario opportunistic splitting is an attractive solution. However this algorithm requires that the metrics of different users form independent, identically distributed (iid) sequences with same distribution and that their distribution and number be known to the scheduler. This limits the usefulness of opportunistic splitting. In this paper we develop a parametric version of this algorithm. The optimal parameters of the algorithm are learnt online through a stochastic approximation scheme. Our algorithm does not require the metrics of different users to have the same distribution. The statistics of these metrics and the number of users can be unknown and also vary with time. Each metric sequence can be Markov. We prove the convergence of the algorithm and show its utility by scheduling the channel to maximize its throughput while satisfying some fairness and/or quality of service constraints.
Resumo:
We consider the problem of scheduling a wireless channel among multiple users. A slot is given to a user with a highest metric (e.g., channel gain) in that slot. The scheduler may not know the channel states of all the users at the beginning of each slot. In this scenario opportunistic splitting is an attractive solution. However this algorithm requires that the metrics of different users form independent, identically distributed (iid) sequences with same distribution and that their distribution and number be known to the scheduler. This limits the usefulness of opportunistic splitting. In this paper we develop a parametric version of this algorithm. The optimal parameters of the algorithm are learnt online through a stochastic approximation scheme. Our algorithm does not require the metrics of different users to have the same distribution. The statistics of these metrics and the number of users can be unknown and also vary with time. We prove the convergence of the algorithm and show its utility by scheduling the channel to maximize its throughput while satisfying some fairness and/or quality of service constraints.
Resumo:
Methyl 5,6-Bis(2-methoxyphenyt)-1,4-dimethyl-7-oxobicyclo[2.2.1]hept-5-en-2-endo-carboxylate, a moderately crowded norbornenone ester, exhibits complex VT-DNMR behaviour. A similar behaviour is not seen in its 7-oxa analogue, showing that conformational transmission from position 7 has a crucial influence on the distance parameters that govern the dynamic processes involving the substituents on the bicycloheptene framework.