Biblioteca Digital

**Autoria(s):** Li Wenbo; Le Sun; Yuanyong Feng; Dakun Zhang
Data(s)	2008
Resumo	Abstract. Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words’ distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of LDA. In such a way, the arbitrariness of choosing latent variables'priors for the multi-level graphical model is overcome. Following this data-driven strategy,two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in balanced and unbalanced corpora.
Identificador	http://ir.iscas.ac.cn/handle/311060/808 http://www.irgrid.ac.cn/handle/1471x/67259
Idioma(s)	英语
Publicador	科学出版社北京
Fonte	Li Wenbo;Le Sun;Yuanyong Feng;Dakun Zhang.Smoothing LDA Model for Text Categorization.见：科学出版社.Lecture Notes in Computer Science,北京,2008,83-94
Palavras-Chave	#固体力学 #Text Categorization #Latent Dirichlet Allocation #Smoothing #Graphical Model
Tipo	会议论文

Acesso ao item digital