Biblioteca Digital

Solving MDPs using two-timescale simulated annealing with multiplicative weights

**Autoria(s):** Abdulla, Mohammed Shahid; Bhatnagar, Shalabh
Data(s)	2007
Resumo	We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.
Formato	application/pdf
Identificador	http://eprints.iisc.ernet.in/27417/1/solving.pdf Abdulla, Mohammed Shahid and Bhatnagar, Shalabh (2007) Solving MDPs using two-timescale simulated annealing with multiplicative weights. In: American Control Conference 2007, JUL 09-13, 2007, New York, NY.
Publicador	IEEE
Relação	http://ieeexplore.ieee.org/search/srchabstract.jsp?tp=&arnumber=4282586&queryText%3DSolving+MDPs+using+two-timescale+simulated+annealing+with+++multiplicative+weights%26openedRefinements%3D*%26searchField%3DSearch+All http://eprints.iisc.ernet.in/27417/
Palavras-Chave	#Computer Science & Automation (Formerly, School of Automation)
Tipo	Conference Paper PeerReviewed

Acesso ao item digital