A two-stage approach for generating topic models


Autoria(s): Xu, Yue; Gao, Yang; Li, Yuefeng; Liu, Bin
Contribuinte(s)

Pei, Jian

Tseng, Vincent S.

Cao, Longbing

Motoda, Hiroshi

Xu, Guandong

Data(s)

01/04/2013

Resumo

Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/60325/

Publicador

Springer Berlin Heidelberg

Relação

http://eprints.qut.edu.au/60325/1/A_Two-stage_Approach_for_Generating_Topic_Models.pdf

http://link.springer.com/chapter/10.1007%2F978-3-642-37456-2_19

DOI:10.1007/978-3-642-37456-2_19

Xu, Yue, Gao, Yang, Li, Yuefeng, & Liu, Bin (2013) A two-stage approach for generating topic models. In Pei, Jian, Tseng, Vincent S., Cao, Longbing, Motoda, Hiroshi, & Xu, Guandong (Eds.) Lecture Notes in Computer Science : Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, Gold Coast Convention and Exhibition Centre, Gold Coast, QLD, pp. 221-232.

Direitos

Copyright 2013 Springer-Verlag Berlin Heidelberg

Fonte

School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #080201 Analysis of Algorithms and Complexity #080603 Conceptual Modelling #Topic modeling #Topic representation #Tf-idf #Frequent pattern mining #Entropy
Tipo

Conference Paper