Efficiency issues of evolutionary k-means


Autoria(s): NALDI, M. C.; CAMPELLO, R. J. G. B.; HRUSCHKA, E. R.; CARVALHO, A. C. P. L. F.
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

20/10/2012

20/10/2012

2011

Resumo

One of the top ten most influential data mining algorithms, k-means, is known for being simple and scalable. However, it is sensitive to initialization of prototypes and requires that the number of clusters be specified in advance. This paper shows that evolutionary techniques conceived to guide the application of k-means can be more computationally efficient than systematic (i.e., repetitive) approaches that try to get around the above-mentioned drawbacks by repeatedly running the algorithm from different configurations for the number of clusters and initial positions of prototypes. To do so, a modified version of a (k-means based) fast evolutionary algorithm for clustering is employed. Theoretical complexity analyses for the systematic and evolutionary algorithms under interest are provided. Computational experiments and statistical analyses of the results are presented for artificial and text mining data sets. (C) 2010 Elsevier B.V. All rights reserved.

CAPES

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

CNPq

FAPESP

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Identificador

APPLIED SOFT COMPUTING, v.11, n.2, p.1938-1952, 2011

1568-4946

http://producao.usp.br/handle/BDPI/28740

10.1016/j.asoc.2010.06.010

http://dx.doi.org/10.1016/j.asoc.2010.06.010

Idioma(s)

eng

Publicador

ELSEVIER SCIENCE BV

Relação

Applied Soft Computing

Direitos

restrictedAccess

Copyright ELSEVIER SCIENCE BV

Palavras-Chave #k-means #Evolutionary clustering #Data mining #GENE-EXPRESSION DATA #MEANS ALGORITHM #CLUSTER VALIDITY #CLASSIFICATION #Computer Science, Artificial Intelligence #Computer Science, Interdisciplinary Applications
Tipo

article

original article

publishedVersion