1 resultado para stratified random sampling
Filtro por publicador
- Abertay Research Collections - Abertay University’s repository (1)
- Academic Archive On-line (Karlstad University; Sweden) (1)
- Acceda, el repositorio institucional de la Universidad de Las Palmas de Gran Canaria. España (1)
- AMS Tesi di Dottorato - Alm@DL - Università di Bologna (2)
- Aquatic Commons (3)
- ARCA - Repositório Institucional da FIOCRUZ (1)
- Archive of European Integration (2)
- Aston University Research Archive (9)
- Avian Conservation and Ecology - Eletronic Cientific Hournal - Écologie et conservation des oiseaux: (5)
- B-Digital - Universidade Fernando Pessoa - Portugal (1)
- Biblioteca de Teses e Dissertações da USP (2)
- Biblioteca Digital | Sistema Integrado de Documentación | UNCuyo - UNCUYO. UNIVERSIDAD NACIONAL DE CUYO. (2)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (6)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP) (61)
- Biblioteca Virtual del Sistema Sanitario Público de Andalucía (BV-SSPA), Junta de Andalucía. Consejería de Salud y Bienestar Social, Spain (7)
- Bioline International (10)
- BORIS: Bern Open Repository and Information System - Berna - Suiça (12)
- Brock University, Canada (11)
- Bulgarian Digital Mathematics Library at IMI-BAS (1)
- CentAUR: Central Archive University of Reading - UK (83)
- Cochin University of Science & Technology (CUSAT), India (15)
- Collection Of Biostatistics Research Archive (2)
- Consorci de Serveis Universitaris de Catalunya (CSUC), Spain (114)
- Cor-Ciencia - Acuerdo de Bibliotecas Universitarias de Córdoba (ABUC), Argentina (1)
- CORA - Cork Open Research Archive - University College Cork - Ireland (1)
- Dalarna University College Electronic Archive (1)
- Digital Commons - Michigan Tech (3)
- Digital Commons at Florida International University (7)
- DigitalCommons@The Texas Medical Center (8)
- DigitalCommons@University of Nebraska - Lincoln (2)
- Diposit Digital de la UB - Universidade de Barcelona (12)
- Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland (25)
- Duke University (2)
- Illinois Digital Environment for Access to Learning and Scholarship Repository (1)
- Institute of Public Health in Ireland, Ireland (1)
- Instituto Politécnico de Santarém (1)
- Instituto Politécnico do Porto, Portugal (7)
- Instituto Superior de Psicologia Aplicada - Lisboa (1)
- Iowa Publications Online (IPO) - State Library, State of Iowa (Iowa), United States (4)
- Lume - Repositório Digital da Universidade Federal do Rio Grande do Sul (1)
- Martin Luther Universitat Halle Wittenberg, Germany (4)
- Massachusetts Institute of Technology (4)
- Memorial University Research Repository (1)
- Ministerio de Cultura, Spain (1)
- Open University Netherlands (1)
- Portal de Revistas Científicas Complutenses - Espanha (1)
- Publishing Network for Geoscientific & Environmental Data (15)
- QSpace: Queen's University - Canada (1)
- QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast (1)
- RCAAP - Repositório Científico de Acesso Aberto de Portugal (1)
- Repositório Científico do Instituto Politécnico de Lisboa - Portugal (7)
- Repositório Científico do Instituto Politécnico de Santarém - Portugal (1)
- Repositório da Produção Científica e Intelectual da Unicamp (3)
- Repositorio de la Universidad de Cuenca (2)
- Repositório digital da Fundação Getúlio Vargas - FGV (1)
- REPOSITORIO DIGITAL IMARPE - INSTITUTO DEL MAR DEL PERÚ, Peru (2)
- Repositório do Centro Hospitalar de Lisboa Central, EPE - Centro Hospitalar de Lisboa Central, EPE, Portugal (4)
- Repositório Institucional da Universidade de Aveiro - Portugal (1)
- Repositório Institucional da Universidade Federal do Rio Grande do Norte (2)
- Repositorio Institucional de la Universidad de Málaga (1)
- Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho" (54)
- RUN (Repositório da Universidade Nova de Lisboa) - FCT (Faculdade de Cienecias e Technologia), Universidade Nova de Lisboa (UNL), Portugal (16)
- SAPIENTIA - Universidade do Algarve - Portugal (2)
- School of Medicine, Washington University, United States (1)
- Scielo España (1)
- Scielo Saúde Pública - SP (86)
- Universidad de Alicante (4)
- Universidad del Rosario, Colombia (19)
- Universidad Politécnica de Madrid (7)
- Universidade de Lisboa - Repositório Aberto (1)
- Universidade do Minho (3)
- Universidade dos Açores - Portugal (4)
- Universidade Estadual Paulista "Júlio de Mesquita Filho" (UNESP) (2)
- Universidade Federal do Pará (3)
- Universidade Federal do Rio Grande do Norte (UFRN) (12)
- Universitat de Girona, Spain (13)
- Universitätsbibliothek Kassel, Universität Kassel, Germany (10)
- Université de Lausanne, Switzerland (137)
- Université de Montréal, Canada (31)
- University of Connecticut - USA (2)
- University of Queensland eSpace - Australia (54)
- University of Southampton, United Kingdom (1)
Resumo:
With Tweet volumes reaching 500 million a day, sampling is inevitable for any application using Twitter data. Realizing this, data providers such as Twitter, Gnip and Boardreader license sampled data streams priced in accordance with the sample size. Big Data applications working with sampled data would be interested in working with a large enough sample that is representative of the universal dataset. Previous work focusing on the representativeness issue has considered ensuring the global occurrence rates of key terms, be reliably estimated from the sample. Present technology allows sample size estimation in accordance with probabilistic bounds on occurrence rates for the case of uniform random sampling. In this paper, we consider the problem of further improving sample size estimates by leveraging stratification in Twitter data. We analyze our estimates through an extensive study using simulations and real-world data, establishing the superiority of our method over uniform random sampling. Our work provides the technical know-how for data providers to expand their portfolio to include stratified sampled datasets, whereas applications are benefited by being able to monitor more topics/events at the same data and computing cost.