Effective 20 newsgroups dataset cleaning


Autoria(s): Albishre, K.; Albathan, M.; Li, Y.
Data(s)

01/12/2015

Resumo

The rapid increase in the number of text documents available on the Internet has created pressure to use effective cleaning techniques. Cleaning techniques are needed for converting these documents to structured documents. Text cleaning techniques are one of the key mechanisms in typical text mining application frameworks. In this paper, we explore the role of text cleaning in the 20 newsgroups dataset, and report on experimental results.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/94139/

Publicador

IEEE

Relação

http://eprints.qut.edu.au/94139/7/94139.pdf

DOI:10.1109/WI-IAT.2015.90

Albishre, K., Albathan, M., & Li, Y. (2015) Effective 20 newsgroups dataset cleaning. In 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), IEEE, Singapore, pp. 98-101.

Direitos

Copyright 2015 IEEE

Fonte

School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #Internet;information resources;text analysis;Internet;effective 20 Newsgroups dataset cleaning;structured documents;text cleaning technique;text documents;text mining application;Cleaning;Electronic mail;Feature extraction;Natural language processing;Nois
Tipo

Conference Paper