On the optimal usage of labelled examples in semi-supervised multi-class classification problems
Data(s) |
23/04/2015
23/04/2015
23/04/2015
|
---|---|
Resumo |
In recent years, the performance of semi-supervised learning has been theoretically investigated. However, most of this theoretical development has focussed on binary classification problems. In this paper, we take it a step further by extending the work of Castelli and Cover [1] [2] to the multi-class paradigm. Particularly, we consider the key problem in semi-supervised learning of classifying an unseen instance x into one of K different classes, using a training dataset sampled from a mixture density distribution and composed of l labelled records and u unlabelled examples. Even under the assumption of identifiability of the mixture and having infinite unlabelled examples, labelled records are needed to determine the K decision regions. Therefore, in this paper, we first investigate the minimum number of labelled examples needed to accomplish that task. Then, we propose an optimal multi-class learning algorithm which is a generalisation of the optimal procedure proposed in the literature for binary problems. Finally, we make use of this generalisation to study the probability of error when the binary class constraint is relaxed. |
Identificador | |
Idioma(s) |
eng |
Relação |
EHU-KZAA-TR;2015-01 |
Direitos |
info:eu-repo/semantics/openAccess |
Palavras-Chave | #semi-supervised learning #probability of error #labelled and unlabelled samples #multi-class classification |
Tipo |
info:eu-repo/semantics/report |