1 resultado para Incremental Information-content
em CORA - Cork Open Research Archive - University College Cork - Ireland
Filtro por publicador
- KUPS-Datenbank - Universität zu Köln - Kölner UniversitätsPublikationsServer (1)
- Repository Napier (2)
- ABACUS. Repositorio de Producción Científica - Universidad Europea (1)
- Academic Archive On-line (Jönköping University; Sweden) (1)
- Acceda, el repositorio institucional de la Universidad de Las Palmas de Gran Canaria. España (1)
- AMS Tesi di Dottorato - Alm@DL - Università di Bologna (10)
- AMS Tesi di Laurea - Alm@DL - Università di Bologna (3)
- ArchiMeD - Elektronische Publikationen der Universität Mainz - Alemanha (4)
- Archimer: Archive de l'Institut francais de recherche pour l'exploitation de la mer (1)
- Archive of European Integration (3)
- Aston University Research Archive (32)
- Biblioteca de Teses e Dissertações da USP (1)
- Biblioteca Digital | Sistema Integrado de Documentación | UNCuyo - UNCUYO. UNIVERSIDAD NACIONAL DE CUYO. (1)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (9)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP) (134)
- Biblioteca Virtual del Sistema Sanitario Público de Andalucía (BV-SSPA), Junta de Andalucía. Consejería de Salud y Bienestar Social, Spain (1)
- Bioline International (3)
- BORIS: Bern Open Repository and Information System - Berna - Suiça (43)
- Brock University, Canada (2)
- Bulgarian Digital Mathematics Library at IMI-BAS (14)
- CentAUR: Central Archive University of Reading - UK (45)
- CiencIPCA - Instituto Politécnico do Cávado e do Ave, Portugal (11)
- Cochin University of Science & Technology (CUSAT), India (6)
- Comissão Econômica para a América Latina e o Caribe (CEPAL) (2)
- Consorci de Serveis Universitaris de Catalunya (CSUC), Spain (17)
- CORA - Cork Open Research Archive - University College Cork - Ireland (1)
- Corvinus Research Archive - The institutional repository for the Corvinus University of Budapest (6)
- CUNY Academic Works (1)
- Dalarna University College Electronic Archive (1)
- Digital Commons - Michigan Tech (3)
- Digital Commons @ DU | University of Denver Research (1)
- Digital Commons at Florida International University (8)
- Digital Knowledge Repository of Central Drug Research Institute (1)
- Digital Peer Publishing (9)
- DigitalCommons - The University of Maine Research (1)
- DigitalCommons@The Texas Medical Center (7)
- DigitalCommons@University of Nebraska - Lincoln (1)
- Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland (33)
- DRUM (Digital Repository at the University of Maryland) (2)
- Duke University (5)
- eScholarship Repository - University of California (1)
- FUNDAJ - Fundação Joaquim Nabuco (1)
- Glasgow Theses Service (1)
- Harvard University (1)
- Illinois Digital Environment for Access to Learning and Scholarship Repository (1)
- Institute of Public Health in Ireland, Ireland (2)
- Institutional Repository of Leibniz University Hannover (1)
- Instituto Politécnico de Viseu (1)
- Instituto Politécnico do Porto, Portugal (16)
- Lume - Repositório Digital da Universidade Federal do Rio Grande do Sul (2)
- Martin Luther Universitat Halle Wittenberg, Germany (1)
- Massachusetts Institute of Technology (2)
- National Center for Biotechnology Information - NCBI (8)
- Nottingham eTheses (1)
- Open University Netherlands (2)
- Portal de Revistas Científicas Complutenses - Espanha (1)
- Publishing Network for Geoscientific & Environmental Data (16)
- QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast (2)
- RCAAP - Repositório Científico de Acesso Aberto de Portugal (1)
- Repositorio Académico de la Universidad Nacional de Costa Rica (1)
- Repositório Alice (Acesso Livre à Informação Científica da Embrapa / Repository Open Access to Scientific Information from Embrapa) (1)
- Repositório Científico do Instituto Politécnico de Lisboa - Portugal (3)
- Repositório da Escola Nacional de Administração Pública (ENAP) (1)
- Repositório da Produção Científica e Intelectual da Unicamp (35)
- Repositório da Universidade Federal do Espírito Santo (UFES), Brazil (10)
- Repositorio de la Universidad de Cuenca (1)
- Repositório digital da Fundação Getúlio Vargas - FGV (7)
- Repositório do ISCTE - Instituto Universitário de Lisboa (2)
- Repositório Institucional da Universidade Federal do Rio Grande do Norte (1)
- Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho" (39)
- Repositorio Institucional Universidad EAFIT - Medelin - Colombia (1)
- RUN (Repositório da Universidade Nova de Lisboa) - FCT (Faculdade de Cienecias e Technologia), Universidade Nova de Lisboa (UNL), Portugal (7)
- SAPIENTIA - Universidade do Algarve - Portugal (1)
- Scielo Saúde Pública - SP (19)
- The Scholarly Commons | School of Hotel Administration; Cornell University Research (1)
- Universidad de Alicante (4)
- Universidad del Rosario, Colombia (2)
- Universidad Politécnica de Madrid (23)
- Universidade Complutense de Madrid (1)
- Universidade do Minho (1)
- Universidade Federal do Pará (2)
- Universidade Federal do Rio Grande do Norte (UFRN) (5)
- Universidade Técnica de Lisboa (1)
- Universita di Parma (1)
- Universitat de Girona, Spain (1)
- Université de Lausanne, Switzerland (39)
- Université de Montréal, Canada (5)
- University of Connecticut - USA (3)
- University of Michigan (8)
- University of Queensland eSpace - Australia (163)
- University of Southampton, United Kingdom (2)
- University of Washington (4)
- WestminsterResearch - UK (1)
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.