Biblioteca Digital

**Autoria(s):** Pchelkin, Arthur
Data(s)	21/12/2009 21/12/2009 2004
Resumo	* This research was partially supported by the Latvian Science Foundation under grant No.02-86d. Efficient exploration is of fundamental importance for autonomous agents that learn to act. Previous approaches to exploration in reinforcement learning usually address exploration in the case when the environment is fully observable. In contrast, the current paper, like the previous paper [Pch2003], studies the case when the environment is only partially observable. One additional difficulty is considered – complex temporal dependencies. In order to overcome this additional difficulty a new hierarchical reinforcement learning algorithm is proposed. The learning algorithm exploits a very simple learning principle, similar to Q-learning, except the lookup table has one more variable – the currently selected goal. Additionally, the algorithm uses the idea of internal reward for achieving hard-to-reach states [Pch2003]. The proposed learning algorithm is experimentally investigated in partially observable maze problems where it shows a robust ability to learn a good policy.
Identificador	1313-0463 http://hdl.handle.net/10525/851
Idioma(s)	en
Publicador	Institute of Information Theories and Applications FOI ITHEA
Palavras-Chave	#Reinforcement Learning #Hierarchical Behaviour #Efficient Exploration #POMDPs #Non-Markov #Local Goals #Internal Reward #Subgoal Learning
Tipo	Article

Acesso ao item digital