Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset


Autoria(s): Selvaraj, Sathiya Keerthi; Bhar, Bigyan; Sellamanickam, Sundararajan; Shevade, Shirish
Data(s)

2011

Resumo

In the design of practical web page classification systems one often encounters a situation in which the labeled training set is created by choosing some examples from each class; but, the class proportions in this set are not the same as those in the test distribution to which the classifier will be actually applied. The problem is made worse when the amount of training data is also small. In this paper we explore and adapt binary SVM methods that make use of unlabeled data from the test distribution, viz., Transductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best performance; its performance is good even when the deviation between the class ratios of the labeled training set and the test set is quite large. When the labeled training data is sufficiently large, an unsupervised Gaussian mixture model can be used to get a very good estimate of the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible performance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models.

Formato

application/pdf

Identificador

http://eprints.iisc.ernet.in/46034/1/In_Know_Man_653_2011.pdf

Selvaraj, Sathiya Keerthi and Bhar, Bigyan and Sellamanickam, Sundararajan and Shevade, Shirish (2011) Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset. In: Proceedings of the 20th ACM international Conference on Information and Knowledge Management, 2011, New York, NY, USA.

Publicador

Association for Computing Machinery

Relação

http://dx.doi.org/10.1145/2063576.2063674

http://eprints.iisc.ernet.in/46034/

Palavras-Chave #Computer Science & Automation (Formerly, School of Automation)
Tipo

Conference Paper

PeerReviewed