Biblioteca Digital

We propose two variants of the Q-learning algorithm that (both) use two timescales. One of these updates Q-values of all feasible state-action pairs at each instant while the other updates Q-values of states with actions chosen according to the ‘current ’ randomized policy updates. A sketch of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms for routing on different network topologies are presented and performance comparisons with the regular Q-learning algorithm are shown.

Veja mais

Effect of X-ray Irradiation on Fibre Reinforced Polymer Composites used as HV Insulation in X-ray Generators

Relevância:

100.00% 100.00%

Publicador:

Veja mais

246 resultados para diplomatic negotiations in international disputes

Filtro por publicador

Experimental studies of the effect of cryogenic treatment on residual stresses in Integral Diaphragm Pressure Transducers for space applications

CFD simulation of fluid flow and heat transfer in high frequency pulse tube refrigerators

Photo-indued effects in Sb/As2S3 Nano-multilayered Film

High precision 16-bit readout gas sensor interface in 0.13im CMOS

Two-timescale Q-learning Algorithms with an Application to Routing in Networks

Effect of X-ray Irradiation on Fibre Reinforced Polymer Composites used as HV Insulation in X-ray Generators