1 resultado para successive improvement
em Massachusetts Institute of Technology
Filtro por publicador
- Academic Research Repository at Institute of Developing Economies (2)
- Acceda, el repositorio institucional de la Universidad de Las Palmas de Gran Canaria. España (2)
- AMS Tesi di Dottorato - Alm@DL - Università di Bologna (8)
- AMS Tesi di Laurea - Alm@DL - Università di Bologna (3)
- ArchiMeD - Elektronische Publikationen der Universität Mainz - Alemanha (3)
- Archive of European Integration (68)
- Biblioteca de Teses e Dissertações da USP (1)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (18)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP) (39)
- Biblioteca Virtual del Sistema Sanitario Público de Andalucía (BV-SSPA), Junta de Andalucía. Consejería de Salud y Bienestar Social, Spain (5)
- BORIS: Bern Open Repository and Information System - Berna - Suiça (76)
- Brock University, Canada (5)
- CentAUR: Central Archive University of Reading - UK (56)
- CiencIPCA - Instituto Politécnico do Cávado e do Ave, Portugal (2)
- Cochin University of Science & Technology (CUSAT), India (6)
- Comissão Econômica para a América Latina e o Caribe (CEPAL) (20)
- Consorci de Serveis Universitaris de Catalunya (CSUC), Spain (23)
- Cor-Ciencia - Acuerdo de Bibliotecas Universitarias de Córdoba (ABUC), Argentina (2)
- CUNY Academic Works (1)
- Dalarna University College Electronic Archive (5)
- Digital Commons - Michigan Tech (1)
- Digital Commons - Montana Tech (1)
- Digital Commons @ DU | University of Denver Research (2)
- Digital Knowledge Repository of Central Drug Research Institute (1)
- Digital Peer Publishing (2)
- DigitalCommons@The Texas Medical Center (7)
- Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland (33)
- Gallica, Bibliotheque Numerique - Bibliothèque nationale de France (French National Library) (BnF), France (2)
- Harvard University (1)
- Institute of Public Health in Ireland, Ireland (9)
- Instituto Politécnico do Porto, Portugal (12)
- Iowa Publications Online (IPO) - State Library, State of Iowa (Iowa), United States (68)
- Lume - Repositório Digital da Universidade Federal do Rio Grande do Sul (1)
- Martin Luther Universitat Halle Wittenberg, Germany (2)
- Massachusetts Institute of Technology (1)
- Ministerio de Cultura, Spain (3)
- National Center for Biotechnology Information - NCBI (5)
- Portal do Conhecimento - Ministerio do Ensino Superior Ciencia e Inovacao, Cape Verde (1)
- Publishing Network for Geoscientific & Environmental Data (2)
- RDBU - Repositório Digital da Biblioteca da Unisinos (1)
- Repositório Científico do Instituto Politécnico de Lisboa - Portugal (2)
- Repositório da Produção Científica e Intelectual da Unicamp (6)
- Repositório digital da Fundação Getúlio Vargas - FGV (1)
- Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho" (96)
- RUN (Repositório da Universidade Nova de Lisboa) - FCT (Faculdade de Cienecias e Technologia), Universidade Nova de Lisboa (UNL), Portugal (13)
- School of Medicine, Washington University, United States (2)
- Scielo Saúde Pública - SP (43)
- Scottish Institute for Research in Economics (SIRE) (SIRE), United Kingdom (2)
- Universidad Autónoma de Nuevo León, Mexico (2)
- Universidad de Alicante (4)
- Universidad del Rosario, Colombia (2)
- Universidad Politécnica de Madrid (38)
- Universidade do Minho (6)
- Universidade dos Açores - Portugal (1)
- Universidade Federal do Pará (1)
- Universitat de Girona, Spain (2)
- Universitätsbibliothek Kassel, Universität Kassel, Germany (2)
- Université de Lausanne, Switzerland (67)
- Université de Montréal, Canada (2)
- University of Connecticut - USA (1)
- University of Michigan (74)
- University of Queensland eSpace - Australia (15)
Resumo:
We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowle ge of the POMDP and allows the experience to be gathered with an arbitrary set of policies. The return is estimated for any new policy of the POMDP. We motivate the estimator from function-approximation and importance sampling points-of-view and derive its theoretical properties. Although the estimator is biased, it has low variance and the bias is often irrelevant when the estimator is used for pair-wise comparisons.We conclude by extending the estimator to policies with memory and compare its performance in a greedy search algorithm to the REINFORCE algorithm showing an order of magnitude reduction in the number of trials required.