A pattern based two-stage text classifier
Data(s) |
2013
|
---|---|
Resumo |
In a classification problem typically we face two challenging issues, the diverse characteristic of negative documents and sometimes a lot of negative documents that are closed to positive documents. Therefore, it is hard for a single classifier to clearly classify incoming documents into classes. This paper proposes a novel gradual problem solving to create a two-stage classifier. The first stage identifies reliable negatives (negative documents with weak positive characteristics). It concentrates on minimizing the number of false negative documents (recall-oriented). We use Rocchio, an existing recall based classifier, for this stage. The second stage is a precision-oriented “fine tuning”, concentrates on minimizing the number of false positive documents by applying pattern (a statistical phrase) mining techniques. In this stage a pattern-based scoring is followed by threshold setting (thresholding). Experiment shows that our statistical phrase based two-stage classifier is promising. |
Identificador | |
Publicador |
Springer |
Relação |
DOI:10.1007/978-3-642-39712-7_13 Bijaksana, Moch Arif, Li, Yuefeng, & Algarni, Abdulmohsen (2013) A pattern based two-stage text classifier. Lecture Notes in Computer Science : Machine Learning and Data Mining in Pattern Recognition, 7988, pp. 169-182. |
Direitos |
Copyright 2013 Springer-Verlag Berlin Heidelberg |
Fonte |
School of Electrical Engineering & Computer Science; Science & Engineering Faculty |
Palavras-Chave | #Two-stage classification #Text classification #Pattern mining #Scoring #Thresholding |
Tipo |
Journal Article |