Exploiting the bin-class histograms for feature selection on discrete data


Autoria(s): Ferreira, Artur J.; Figueiredo, Mário A. T.
Data(s)

21/04/2016

21/04/2016

2015

Resumo

In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.

Identificador

FERREIRA, Artur J.; FIGUEIREDO, Mário A. T. - Exploiting the Bin-Class Histograms for Feature Selection on Discrete Data. th Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA). Santiago de Compostela: SPRINGER-VERLAG BERLIN, 2015. ISBN. 978-3-319-19390-8. Vol. 9117, pp. 345-353.

978-3-319-19390-8

0302-9743

http://hdl.handle.net/10400.21/6075

10.1007/978-3-319-19390-8_39

Idioma(s)

eng

Publicador

Springer-Verlag Berlin

Relação

http://link.springer.com/chapter/10.1007%2F978-3-319-19390-8_39

Direitos

closedAccess

Palavras-Chave #Feature selection #Feature discretization #Discrete features #Bin-class histogram #Matrix norm #Supervised learning #Classification
Tipo

conferenceObject