Biblioteca Digital

Enhancing the effectiveness of clustering with spectra analysis

**Autoria(s):** Li, Wenyuan; Ng, Wee-Keong; Liu, Ying; Ong, Kok-Leong
Data(s)	01/07/2007
Resumo	For many clustering algorithms, such as K-Means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters, that is, <i>k</i>, to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in <i>text</i> collections such as Web documents, images, or biological data. In an effort to improve the effectiveness of clustering, we seek the answer to a fundamental question: <i>How can we effectively estimate</i> <i>the number of clusters in a given data set?</i> We propose an efficient method based on spectra analysis of eigenvalues (<i>not </i>eigenvectors) of the data set as the solution to the above. We first present the relationship between a data set and its underlying spectra with theoretical and experimental results. We then show how our method is capable of suggesting a range of k that is well suited to different analysis contexts. Finally, we conclude with further empirical results to show how the answer to this fundamental question enhances the clustering process for large text collections.<br />
Identificador	http://hdl.handle.net/10536/DRO/DU:30007065
Idioma(s)	eng
Publicador	Institute of Electrical and Electronics Engineers
Relação	http://dro.deakin.edu.au/eserv/DU:30007065/ong-enhancingthe-2007.pdf http://dx.doi.org/10.1109/TKDE.2007.1066
Direitos	2007, IEEE
Palavras-Chave	#clustering #spectral methods #eigenvalues #eigenvectors
Tipo	Journal Article

Acesso ao item digital