Diffusion Gradient Temporal Difference for Cooperative Reinforcement Learning with Linear Function Approximation


Autoria(s): Valcarcel Macua, Sergio; Belanovic, Pavle; Zazo Bello, Santiago
Data(s)

01/05/2012

Resumo

We introduce a diffusion-based algorithm in which multiple agents cooperate to predict a common and global statevalue function by sharing local estimates and local gradient information among neighbors. Our algorithm is a fully distributed implementation of the gradient temporal difference with linear function approximation, to make it applicable to multiagent settings. Simulations illustrate the benefit of cooperation in learning, as made possible by the proposed algorithm.

Formato

application/pdf

Identificador

http://oa.upm.es/20234/

Idioma(s)

eng

Publicador

E.T.S.I. Telecomunicación (UPM)

Relação

http://oa.upm.es/20234/1/INVE_MEM_2012_137146.pdf

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6232901

info:eu-repo/semantics/altIdentifier/doi/null

Direitos

http://creativecommons.org/licenses/by-nc-nd/3.0/es/

info:eu-repo/semantics/openAccess

Fonte

3rd International Workshop on Cognitive Incromation Processing (CIP) | 3rd International Workshop on Cognitive Incromation Processing (CIP) | 28/05/2012 - 30/05/2012 | Baiona

Palavras-Chave #Telecomunicaciones #Robótica e Informática Industrial
Tipo

info:eu-repo/semantics/conferenceObject

Ponencia en Congreso o Jornada

PeerReviewed