Clustering Bimodality Multidimensional Space Asymptotic Test
Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Serdica Journal of Computing, Vol. 6, No 4, (2012), 437p-450p
We present a test for identifying clusters in high dimensional
data based on the k-means algorithm when the null hypothesis is spherical
normal. We show that projection techniques used for evaluating validity of
clusters may be misleading for such data. In particular, we demonstrate
that increasingly well-separated clusters are identified as the dimensionality
increases, when no such clusters exist. Furthermore, in a case of true
bimodality, increasing the dimensionality makes identifying the correct clusters more difficult.
In addition to the original conservative test, we propose a practical test with the same asymptotic behavior that performs well for a
moderate number of points and moderate dimensionality. ACM Computing Classification System (1998): I.5.3.