A pattern based two-stage text classifier


Autoria(s): Bijaksana, Moch Arif; Li, Yuefeng; Algarni, Abdulmohsen
Data(s)

2013

Resumo

In a classification problem typically we face two challenging issues, the diverse characteristic of negative documents and sometimes a lot of negative documents that are closed to positive documents. Therefore, it is hard for a single classifier to clearly classify incoming documents into classes. This paper proposes a novel gradual problem solving to create a two-stage classifier. The first stage identifies reliable negatives (negative documents with weak positive characteristics). It concentrates on minimizing the number of false negative documents (recall-oriented). We use Rocchio, an existing recall based classifier, for this stage. The second stage is a precision-oriented “fine tuning”, concentrates on minimizing the number of false positive documents by applying pattern (a statistical phrase) mining techniques. In this stage a pattern-based scoring is followed by threshold setting (thresholding). Experiment shows that our statistical phrase based two-stage classifier is promising.

Identificador

http://eprints.qut.edu.au/61989/

Publicador

Springer

Relação

DOI:10.1007/978-3-642-39712-7_13

Bijaksana, Moch Arif, Li, Yuefeng, & Algarni, Abdulmohsen (2013) A pattern based two-stage text classifier. Lecture Notes in Computer Science : Machine Learning and Data Mining in Pattern Recognition, 7988, pp. 169-182.

Direitos

Copyright 2013 Springer-Verlag Berlin Heidelberg

Fonte

School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #Two-stage classification #Text classification #Pattern mining #Scoring #Thresholding
Tipo

Journal Article