Towards improving cluster-based feature selection with a simplified silhouette filter


Autoria(s): COVOES, Thiago F.; HRUSCHKA, Eduardo R.
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

20/10/2012

20/10/2012

2011

Resumo

This paper proposes a filter-based algorithm for feature selection. The filter is based on the partitioning of the set of features into clusters. The number of clusters, and consequently the cardinality of the subset of selected features, is automatically estimated from data. The computational complexity of the proposed algorithm is also investigated. A variant of this filter that considers feature-class correlations is also proposed for classification problems. Empirical results involving ten datasets illustrate the performance of the developed algorithm, which in general has obtained competitive results in terms of classification accuracy when compared to state of the art algorithms that find clusters of features. We show that, if computational efficiency is an important issue, then the proposed filter May be preferred over their counterparts, thus becoming eligible to join a pool of feature selection algorithms to be used in practice. As an additional contribution of this work, a theoretical framework is used to formally analyze some properties of feature selection methods that rely on finding clusters of features. (C) 2011 Elsevier Inc. All rights reserved.

CNPq

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

FAPESP

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Identificador

INFORMATION SCIENCES, v.181, n.18, p.3766-3782, 2011

0020-0255

http://producao.usp.br/handle/BDPI/28761

10.1016/j.ins.2011.04.050

http://dx.doi.org/10.1016/j.ins.2011.04.050

Idioma(s)

eng

Publicador

ELSEVIER SCIENCE INC

Relação

Information Sciences

Direitos

restrictedAccess

Copyright ELSEVIER SCIENCE INC

Palavras-Chave #Feature selection #Filters #Clustering #Classification #GENE-EXPRESSION DATA #SUPPORT VECTOR MACHINES #MUTUAL INFORMATION #CLASSIFICATION #PATTERNS #Computer Science, Information Systems
Tipo

article

original article

publishedVersion