On comparing two sequences of numbers and its applications to clustering analysis


Autoria(s): CAMPELLO, R. J. G. B.; HRUSCHKA, E. R.
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

20/10/2012

20/10/2012

2009

Resumo

A conceptual problem that appears in different contexts of clustering analysis is that of measuring the degree of compatibility between two sequences of numbers. This problem is usually addressed by means of numerical indexes referred to as sequence correlation indexes. This paper elaborates on why some specific sequence correlation indexes may not be good choices depending on the application scenario in hand. A variant of the Product-Moment correlation coefficient and a weighted formulation for the Goodman-Kruskal and Kendall`s indexes are derived that may be more appropriate for some particular application scenarios. The proposed and existing indexes are analyzed from different perspectives, such as their sensitivity to the ranks and magnitudes of the sequences under evaluation, among other relevant aspects of the problem. The results help suggesting scenarios within the context of clustering analysis that are possibly more appropriate for the application of each index. (C) 2008 Elsevier Inc. All rights reserved.

Identificador

INFORMATION SCIENCES, v.179, n.8, p.1025-1039, 2009

0020-0255

http://producao.usp.br/handle/BDPI/28778

10.1016/j.ins.2008.11.028

http://dx.doi.org/10.1016/j.ins.2008.11.028

Idioma(s)

eng

Publicador

ELSEVIER SCIENCE INC

Relação

Information Sciences

Direitos

restrictedAccess

Copyright ELSEVIER SCIENCE INC

Palavras-Chave #Clustering analysis #Goodman-Kruskal index #Kendall`s index #Pearson Product-Moment index #Spearman`s index #Sensitivity analysis #GENE-EXPRESSION DATA #Computer Science, Information Systems
Tipo

article

original article

publishedVersion