1 resultado para Lead Analysis Data processing
em CORA - Cork Open Research Archive - University College Cork - Ireland
Filtro por publicador
- KUPS-Datenbank - Universität zu Köln - Kölner UniversitätsPublikationsServer (1)
- Aberdeen University (1)
- Abertay Research Collections - Abertay University’s repository (1)
- Academic Research Repository at Institute of Developing Economies (1)
- Acceda, el repositorio institucional de la Universidad de Las Palmas de Gran Canaria. España (1)
- AMS Tesi di Dottorato - Alm@DL - Università di Bologna (17)
- AMS Tesi di Laurea - Alm@DL - Università di Bologna (3)
- ArchiMeD - Elektronische Publikationen der Universität Mainz - Alemanha (11)
- Archimer: Archive de l'Institut francais de recherche pour l'exploitation de la mer (4)
- Archive of European Integration (21)
- Aston University Research Archive (36)
- Biblioteca de Teses e Dissertações da USP (3)
- Biblioteca Digital | Sistema Integrado de Documentación | UNCuyo - UNCUYO. UNIVERSIDAD NACIONAL DE CUYO. (1)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (2)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP) (20)
- Biblioteca Virtual del Sistema Sanitario Público de Andalucía (BV-SSPA), Junta de Andalucía. Consejería de Salud y Bienestar Social, Spain (1)
- Biodiversity Heritage Library, United States (2)
- BORIS: Bern Open Repository and Information System - Berna - Suiça (19)
- Brock University, Canada (2)
- Bulgarian Digital Mathematics Library at IMI-BAS (11)
- CentAUR: Central Archive University of Reading - UK (29)
- Cochin University of Science & Technology (CUSAT), India (2)
- Coffee Science - Universidade Federal de Lavras (1)
- Collection Of Biostatistics Research Archive (1)
- Comissão Econômica para a América Latina e o Caribe (CEPAL) (4)
- Consorci de Serveis Universitaris de Catalunya (CSUC), Spain (250)
- Cor-Ciencia - Acuerdo de Bibliotecas Universitarias de Córdoba (ABUC), Argentina (1)
- CORA - Cork Open Research Archive - University College Cork - Ireland (1)
- CUNY Academic Works (5)
- Dalarna University College Electronic Archive (5)
- Digital Commons - Michigan Tech (2)
- Digital Commons at Florida International University (13)
- Digital Peer Publishing (1)
- DigitalCommons - The University of Maine Research (1)
- DigitalCommons@The Texas Medical Center (2)
- Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland (8)
- DRUM (Digital Repository at the University of Maryland) (1)
- Duke University (2)
- Escola Superior de Educação de Paula Frassinetti (4)
- FUNDAJ - Fundação Joaquim Nabuco (1)
- Glasgow Theses Service (1)
- Institute of Public Health in Ireland, Ireland (2)
- Instituto Politécnico de Castelo Branco - Portugal (3)
- Instituto Politécnico do Porto, Portugal (28)
- Iowa Publications Online (IPO) - State Library, State of Iowa (Iowa), United States (3)
- Lume - Repositório Digital da Universidade Federal do Rio Grande do Sul (1)
- Martin Luther Universitat Halle Wittenberg, Germany (1)
- National Center for Biotechnology Information - NCBI (2)
- Open University Netherlands (1)
- Portal do Conhecimento - Ministerio do Ensino Superior Ciencia e Inovacao, Cape Verde (2)
- Projetos e Dissertações em Sistemas de Informação e Gestão do Conhecimento (1)
- Publishing Network for Geoscientific & Environmental Data (23)
- QSpace: Queen's University - Canada (2)
- QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast (3)
- ReCiL - Repositório Científico Lusófona - Grupo Lusófona, Portugal (1)
- Repositório Aberto da Universidade Aberta de Portugal (1)
- REPOSITÓRIO ABERTO do Instituto Superior Miguel Torga - Portugal (1)
- Repositório Científico da Universidade de Évora - Portugal (6)
- Repositório Científico do Instituto Politécnico de Lisboa - Portugal (21)
- Repositório da Escola Nacional de Administração Pública (ENAP) (1)
- Repositório da Produção Científica e Intelectual da Unicamp (3)
- Repositório da Universidade Federal do Espírito Santo (UFES), Brazil (2)
- Repositorio de la Universidad de Cuenca (2)
- Repositório digital da Fundação Getúlio Vargas - FGV (2)
- Repositório Digital da UNIVERSIDADE DA MADEIRA - Portugal (1)
- Repositório do Centro Hospitalar de Lisboa Central, EPE - Centro Hospitalar de Lisboa Central, EPE, Portugal (1)
- Repositorio Institucional da UFLA (RIUFLA) (1)
- Repositório Institucional da Universidade de Aveiro - Portugal (4)
- Repositório Institucional da Universidade de Brasília (4)
- Repositório Institucional da Universidade Federal do Rio Grande do Norte (1)
- Repositório Institucional da Universidade Tecnológica Federal do Paraná (RIUT) (2)
- Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho" (52)
- Repositorio Institucional Universidad EAFIT - Medelin - Colombia (3)
- RUN (Repositório da Universidade Nova de Lisboa) - FCT (Faculdade de Cienecias e Technologia), Universidade Nova de Lisboa (UNL), Portugal (11)
- Scielo Saúde Pública - SP (5)
- Universidad de Alicante (6)
- Universidad del Rosario, Colombia (4)
- Universidad Politécnica de Madrid (25)
- Universidade do Minho (7)
- Universidade dos Açores - Portugal (2)
- Universidade Federal do Pará (11)
- Universidade Federal do Rio Grande do Norte (UFRN) (13)
- Universidade Metodista de São Paulo (2)
- Universidade Técnica de Lisboa (1)
- Universitat de Girona, Spain (5)
- Universitätsbibliothek Kassel, Universität Kassel, Germany (56)
- Université de Lausanne, Switzerland (23)
- Université de Montréal, Canada (2)
- University of Michigan (71)
- University of Queensland eSpace - Australia (18)
- University of Southampton, United Kingdom (2)
- University of Washington (5)
- WestminsterResearch - UK (2)
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.