Partitions selection strategy for set of clustering solutions


Autoria(s): FACELI, Katti; SAKATA, Tiemi C.; SOUTO, Marcilio C. P. de; CARVALHO, Andre C. P. L. F. de
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

20/10/2012

20/10/2012

2010

Resumo

Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions. (C) 2010 Elsevier B.V. All rights reserved.

Identificador

NEUROCOMPUTING, v.73, n.16-18, Special Issue, p.2809-2819, 2010

0925-2312

http://producao.usp.br/handle/BDPI/28751

10.1016/j.neucom.2010.03.028

http://dx.doi.org/10.1016/j.neucom.2010.03.028

Idioma(s)

eng

Publicador

ELSEVIER SCIENCE BV

Relação

Neurocomputing

Direitos

restrictedAccess

Copyright ELSEVIER SCIENCE BV

Palavras-Chave #Clustering #Model selection #GENE-EXPRESSION SIGNATURES #MOLECULAR CLASSIFICATION #MICROARRAY DATA #CLASS DISCOVERY #CANCER #PREDICTION #VALIDATION #CARCINOMAS #LEUKEMIA #Computer Science, Artificial Intelligence
Tipo

article

proceedings paper

publishedVersion