1 resultado para Importance sampling
em Massachusetts Institute of Technology
Filtro por publicador
- Aberystwyth University Repository - Reino Unido (1)
- Academic Archive On-line (Stockholm University; Sweden) (1)
- AMS Tesi di Dottorato - Alm@DL - Università di Bologna (2)
- Aquatic Commons (100)
- ArchiMeD - Elektronische Publikationen der Universität Mainz - Alemanha (1)
- Archivo Digital para la Docencia y la Investigación - Repositorio Institucional de la Universidad del País Vasco (10)
- Aston University Research Archive (4)
- B-Digital - Universidade Fernando Pessoa - Portugal (1)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (1)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP) (1)
- Biblioteca Digital de Teses e Dissertações Eletrônicas da UERJ (13)
- BORIS: Bern Open Repository and Information System - Berna - Suiça (9)
- Boston University Digital Common (3)
- Brock University, Canada (2)
- Bucknell University Digital Commons - Pensilvania - USA (1)
- CaltechTHESIS (6)
- Cambridge University Engineering Department Publications Database (73)
- CentAUR: Central Archive University of Reading - UK (11)
- Chinese Academy of Sciences Institutional Repositories Grid Portal (32)
- CiencIPCA - Instituto Politécnico do Cávado e do Ave, Portugal (1)
- Collection Of Biostatistics Research Archive (1)
- CORA - Cork Open Research Archive - University College Cork - Ireland (1)
- Cornell: DigitalCommons@ILR (1)
- Corvinus Research Archive - The institutional repository for the Corvinus University of Budapest (1)
- Deakin Research Online - Australia (4)
- DI-fusion - The institutional repository of Université Libre de Bruxelles (3)
- Digital Commons at Florida International University (1)
- DigitalCommons - The University of Maine Research (1)
- DRUM (Digital Repository at the University of Maryland) (2)
- Duke University (13)
- eResearch Archive - Queensland Department of Agriculture; Fisheries and Forestry (24)
- FAUBA DIGITAL: Repositorio institucional científico y académico de la Facultad de Agronomia de la Universidad de Buenos Aires (1)
- Greenwich Academic Literature Archive - UK (2)
- Helda - Digital Repository of University of Helsinki (27)
- Indian Institute of Science - Bangalore - Índia (69)
- Instituto Politécnico do Porto, Portugal (3)
- Instituto Superior de Psicologia Aplicada - Lisboa (1)
- Lume - Repositório Digital da Universidade Federal do Rio Grande do Sul (1)
- Massachusetts Institute of Technology (1)
- Plymouth Marine Science Electronic Archive (PlyMSEA) (73)
- Portal de Revistas Científicas Complutenses - Espanha (1)
- Publishing Network for Geoscientific & Environmental Data (1)
- QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast (183)
- Queensland University of Technology - ePrints Archive (239)
- Repositório Científico da Universidade de Évora - Portugal (2)
- Repositório Científico do Instituto Politécnico de Lisboa - Portugal (2)
- Repositório Institucional da Universidade de Aveiro - Portugal (8)
- Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho" (7)
- SAPIENTIA - Universidade do Algarve - Portugal (9)
- Universidad Politécnica de Madrid (7)
- Universidade Complutense de Madrid (1)
- Universidade dos Açores - Portugal (2)
- Universidade Técnica de Lisboa (1)
- Université de Montréal, Canada (6)
- University of Queensland eSpace - Australia (8)
- University of Washington (2)
- WestminsterResearch - UK (8)
- Worcester Research and Publications - Worcester Research and Publications - UK (4)
Resumo:
We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowle ge of the POMDP and allows the experience to be gathered with an arbitrary set of policies. The return is estimated for any new policy of the POMDP. We motivate the estimator from function-approximation and importance sampling points-of-view and derive its theoretical properties. Although the estimator is biased, it has low variance and the bias is often irrelevant when the estimator is used for pair-wise comparisons.We conclude by extending the estimator to policies with memory and compare its performance in a greedy search algorithm to the REINFORCE algorithm showing an order of magnitude reduction in the number of trials required.