Generalized external indexes for comparing data partitions with overlapping categories


Autoria(s): CAMPELLO, R. J. G. B.
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

20/10/2012

20/10/2012

2010

Resumo

There is a family of well-known external clustering validity indexes to measure the degree of compatibility or similarity between two hard partitions of a given data set, including partitions with different numbers of categories. A unified, fully equivalent set-theoretic formulation for an important class of such indexes was derived and extended to the fuzzy domain in a previous work by the author [Campello, R.J.G.B., 2007. A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Lett., 28, 833-841]. However, the proposed fuzzy set-theoretic formulation is not valid as a general approach for comparing two fuzzy partitions of data. Instead, it is an approach for comparing a fuzzy partition against a hard referential partition of the data into mutually disjoint categories. In this paper, generalized external indexes for comparing two data partitions with overlapping categories are introduced. These indexes can be used as general measures for comparing two partitions of the same data set into overlapping categories. An important issue that is seldom touched in the literature is also addressed in the paper, namely, how to compare two partitions of different subsamples of data. A number of pedagogical examples and three simulation experiments are presented and analyzed in details. A review of recent related work compiled from the literature is also provided. (c) 2010 Elsevier B.V. All rights reserved.

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Brazilian National Research Council (CNPq)

Research Foundation of the State of Sao Paulo (Fapesp)[06/50231-5]

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Research Foundation of the State of Sao Paulo (Fapesp)[301063/2007-9]

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Identificador

PATTERN RECOGNITION LETTERS, v.31, n.9, p.966-975, 2010

0167-8655

http://producao.usp.br/handle/BDPI/28755

10.1016/j.patrec.2010.01.002

http://dx.doi.org/10.1016/j.patrec.2010.01.002

Idioma(s)

eng

Publicador

ELSEVIER SCIENCE BV

Relação

Pattern Recognition Letters

Direitos

restrictedAccess

Copyright ELSEVIER SCIENCE BV

Palavras-Chave #Clustering #Overlapping #Partitions #Validity #Indexes #2 HIERARCHICAL CLUSTERINGS #FUZZY C-MEANS #VALIDATION #EXTENSION #CRITERION #VALIDITY #Computer Science, Artificial Intelligence
Tipo

article

original article

publishedVersion