Cooperative off-policy prediction of markov decision processes in adaptive networks


Autoria(s): Valcarcel Macua, Sergio; Chen, Jianshu; Zazo Bello, Santiago; Sayed, Ali H.
Data(s)

2013

Resumo

We apply diffusion strategies to propose a cooperative reinforcement learning algorithm, in which agents in a network communicate with their neighbors to improve predictions about their environment. The algorithm is suitable to learn off-policy even in large state spaces. We provide a mean-square-error performance analysis under constant step-sizes. The gain of cooperation in the form of more stability and less bias and variance in the prediction error, is illustrated in the context of a classical model. We show that the improvement in performance is especially significant when the behavior policy of the agents is different from the target policy under evaluation.

Formato

application/pdf

Identificador

http://oa.upm.es/28941/

Idioma(s)

eng

Publicador

E.T.S.I. Telecomunicación (UPM)

Relação

http://oa.upm.es/28941/1/INVE_MEM_2013_166647.pdf

http://dx.doi.org/10.1109/ICASSP.2013.6638519

info:eu-repo/semantics/altIdentifier/doi/null

Direitos

http://creativecommons.org/licenses/by-nc-nd/3.0/es/

info:eu-repo/semantics/openAccess

Fonte

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) | IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) | 26/05/2013 - 31/05/2013 | Vancouver, Canada

Palavras-Chave #Telecomunicaciones
Tipo

info:eu-repo/semantics/conferenceObject

Ponencia en Congreso o Jornada

PeerReviewed