Efficient classification using phrases generated by topic models


Autoria(s): Gujraniya, Deepak; Murty, Narsimha M
Data(s)

2012

Resumo

There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn't capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.

Formato

application/pdf

Identificador

http://eprints.iisc.ernet.in/46624/1/Int_Con_Pat_Rec_1051_2013.pdf

Gujraniya, Deepak and Murty, Narsimha M (2012) Efficient classification using phrases generated by topic models. In: 21st International Conference on Pattern Recognition (ICPR 2012), 11-15 Nov. 2012, Tsukuba, Japan.

Publicador

IEEE

Relação

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6460632

http://eprints.iisc.ernet.in/46624/

Palavras-Chave #Computer Science & Automation (Formerly, School of Automation)
Tipo

Conference Paper

PeerReviewed