Biblioteca Digital

**Autoria(s):** Wu, Shunyao; Wang, Jinlong; Vu, Huy Quan; Li, Gang
Data(s)	01/01/2010
Resumo	Important words, which usually exist in part of Title, Subject and Keywords, can briefly reflect the main topic of a document. In recent years, it is a common practice to exploit the semantic topic of documents and utilize important words to achieve document clustering, especially for short texts such as news articles. This paper proposes a novel method to extract important words from Subject and Keywords of articles, and then partition documents only with those important words. Considering the fact that frequencies of important words are usually low and the scale matrix dataset for important words is small, a normalization method is then proposed to normalize the scale dataset so that more accurate results can be achieved by sufficiently exploiting the limited information. The experiments validate the effectiveness of our method.<br />
Identificador	http://hdl.handle.net/10536/DRO/DU:30034432
Idioma(s)	eng
Publicador	Association for Computing Machinery
Relação	http://dro.deakin.edu.au/eserv/DU:30034432/vu-jcdlconference-2010.pdf http://dro.deakin.edu.au/eserv/DU:30034432/vu-textclustering-2010.pdf
Direitos	2010, by the Association for Computing Machinery, Inc.
Palavras-Chave	#document clustering #important words #normalization
Tipo	Conference Paper

Acesso ao item digital