1 resultado para text and data mining
em Illinois Digital Environment for Access to Learning and Scholarship Repository
Filtro por publicador
- JISC Information Environment Repository (1)
- Aberdeen University (6)
- Abertay Research Collections - Abertay University’s repository (1)
- Academic Archive On-line (Karlstad University; Sweden) (1)
- Academic Archive On-line (Mid Sweden University; Sweden) (1)
- Acceda, el repositorio institucional de la Universidad de Las Palmas de Gran Canaria. España (3)
- Adam Mickiewicz University Repository (1)
- AMS Tesi di Dottorato - Alm@DL - Università di Bologna (12)
- AMS Tesi di Laurea - Alm@DL - Università di Bologna (15)
- ArchiMeD - Elektronische Publikationen der Universität Mainz - Alemanha (2)
- Archimer: Archive de l'Institut francais de recherche pour l'exploitation de la mer (4)
- Archive of European Integration (7)
- Aston University Research Archive (29)
- Biblioteca de Teses e Dissertações da USP (1)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (10)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP) (13)
- Biodiversity Heritage Library, United States (4)
- BORIS: Bern Open Repository and Information System - Berna - Suiça (27)
- Brock University, Canada (7)
- Bulgarian Digital Mathematics Library at IMI-BAS (12)
- CentAUR: Central Archive University of Reading - UK (106)
- CiencIPCA - Instituto Politécnico do Cávado e do Ave, Portugal (1)
- Cochin University of Science & Technology (CUSAT), India (14)
- Comissão Econômica para a América Latina e o Caribe (CEPAL) (5)
- Consorci de Serveis Universitaris de Catalunya (CSUC), Spain (24)
- Cor-Ciencia - Acuerdo de Bibliotecas Universitarias de Córdoba (ABUC), Argentina (1)
- CUNY Academic Works (3)
- Dalarna University College Electronic Archive (12)
- Department of Computer Science E-Repository - King's College London, Strand, London (1)
- Digital Archives@Colby (1)
- Digital Commons - Michigan Tech (6)
- Digital Commons @ Winthrop University (2)
- Digital Commons at Florida International University (27)
- Digital Peer Publishing (2)
- DigitalCommons@The Texas Medical Center (4)
- DigitalCommons@University of Nebraska - Lincoln (1)
- Digitale Sammlungen - Goethe-Universität Frankfurt am Main (1)
- Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland (32)
- DRUM (Digital Repository at the University of Maryland) (2)
- Illinois Digital Environment for Access to Learning and Scholarship Repository (1)
- Institute of Public Health in Ireland, Ireland (3)
- Instituto Politécnico do Porto, Portugal (34)
- Iowa Publications Online (IPO) - State Library, State of Iowa (Iowa), United States (2)
- Lume - Repositório Digital da Universidade Federal do Rio Grande do Sul (1)
- Martin Luther Universitat Halle Wittenberg, Germany (10)
- Massachusetts Institute of Technology (4)
- Memorial University Research Repository (1)
- Ministerio de Cultura, Spain (1)
- National Center for Biotechnology Information - NCBI (3)
- Plymouth Marine Science Electronic Archive (PlyMSEA) (2)
- Portal do Conhecimento - Ministerio do Ensino Superior Ciencia e Inovacao, Cape Verde (1)
- Publishing Network for Geoscientific & Environmental Data (12)
- RCAAP - Repositório Científico de Acesso Aberto de Portugal (2)
- ReCiL - Repositório Científico Lusófona - Grupo Lusófona, Portugal (2)
- Repositório Alice (Acesso Livre à Informação Científica da Embrapa / Repository Open Access to Scientific Information from Embrapa) (2)
- Repositório Científico da Universidade de Évora - Portugal (1)
- Repositório Científico do Instituto Politécnico de Lisboa - Portugal (7)
- Repositório da Produção Científica e Intelectual da Unicamp (1)
- Repositório digital da Fundação Getúlio Vargas - FGV (6)
- Repositório Digital da UNIVERSIDADE DA MADEIRA - Portugal (1)
- Repositorio Institucional da UFLA (RIUFLA) (1)
- Repositorio Institucional de la Universidad de Málaga (1)
- Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho" (19)
- RUN (Repositório da Universidade Nova de Lisboa) - FCT (Faculdade de Cienecias e Technologia), Universidade Nova de Lisboa (UNL), Portugal (16)
- School of Medicine, Washington University, United States (1)
- Scielo Saúde Pública - SP (49)
- Universidad de Alicante (9)
- Universidad del Rosario, Colombia (2)
- Universidad Politécnica de Madrid (21)
- Universidade Complutense de Madrid (1)
- Universidade do Minho (25)
- Universidade dos Açores - Portugal (1)
- Universidade Metodista de São Paulo (3)
- Universitat de Girona, Spain (5)
- Universitätsbibliothek Kassel, Universität Kassel, Germany (12)
- Université de Lausanne, Switzerland (35)
- Université de Montréal, Canada (4)
- Université Laval Mémoires et thèses électroniques (2)
- University of Canberra Research Repository - Australia (1)
- University of Michigan (82)
- University of Queensland eSpace - Australia (34)
- University of Southampton, United Kingdom (11)
- University of Washington (3)
- Worcester Research and Publications - Worcester Research and Publications - UK (1)
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.