A Bimodality Test in High Dimensions


Autoria(s): Palejev, Dean
Data(s)

29/03/2013

29/03/2013

2012

Resumo

We present a test for identifying clusters in high dimensional data based on the k-means algorithm when the null hypothesis is spherical normal. We show that projection techniques used for evaluating validity of clusters may be misleading for such data. In particular, we demonstrate that increasingly well-separated clusters are identified as the dimensionality increases, when no such clusters exist. Furthermore, in a case of true bimodality, increasing the dimensionality makes identifying the correct clusters more difficult. In addition to the original conservative test, we propose a practical test with the same asymptotic behavior that performs well for a moderate number of points and moderate dimensionality. ACM Computing Classification System (1998): I.5.3.

Identificador

Serdica Journal of Computing, Vol. 6, No 4, (2012), 437p-450p

1312-6555

http://hdl.handle.net/10525/1977

Idioma(s)

en

Publicador

Institute of Mathematics and Informatics Bulgarian Academy of Sciences

Palavras-Chave #Clustering #Bimodality #Multidimensional Space #Asymptotic Test
Tipo

Article