Text clustering with important words using normalization


Autoria(s): Wu, Shunyao; Wang, Jinlong; Vu, Huy Quan; Li, Gang
Data(s)

01/01/2010

Resumo

Important words, which usually exist in part of Title, Subject and Keywords, can briefly reflect the main topic of a document. In recent years, it is a common practice to exploit the semantic topic of documents and utilize important words to achieve document clustering, especially for short texts such as news articles. This paper proposes a novel method to extract important words from Subject and Keywords of articles, and then partition documents only with those important words. Considering the fact that frequencies of important words are usually low and the scale matrix dataset for important words is small, a normalization method is then proposed to normalize the scale dataset so that more accurate results can be achieved by sufficiently exploiting the limited information. The experiments validate the effectiveness of our method.<br />

Identificador

http://hdl.handle.net/10536/DRO/DU:30034432

Idioma(s)

eng

Publicador

Association for Computing Machinery

Relação

http://dro.deakin.edu.au/eserv/DU:30034432/vu-jcdlconference-2010.pdf

http://dro.deakin.edu.au/eserv/DU:30034432/vu-textclustering-2010.pdf

Direitos

2010, by the Association for Computing Machinery, Inc.

Palavras-Chave #document clustering #important words #normalization
Tipo

Conference Paper