Semi-supervised document clustering via loci
Contribuinte(s) |
Wang, Jianyong Cellary, Wojciech Wang, Dingding Wang, Hua Chen, Shu-Ching Li, Tao Zhang, Yanchun |
---|---|
Data(s) |
01/11/2015
|
Resumo |
Document clustering is one of the prominent methods for mining important information from the vast amount of data available on the web. However, document clustering generally suffers from the curse of dimensionality. Providentially in high dimensional space, data points tend to be more concentrated in some areas of clusters. We take advantage of this phenomenon by introducing a novel concept of dynamic cluster representation named as loci. Clusters’ loci are efficiently calculated using documents’ ranking scores generated from a search engine. We propose a fast loci-based semi-supervised document clustering algorithm that uses clusters’ loci instead of conventional centroids for assigning documents to clusters. Empirical analysis on real-world datasets shows that the proposed method produces cluster solutions with promising quality and is substantially faster than several benchmarked centroid-based semi-supervised document clustering methods. |
Formato |
application/pdf |
Identificador | |
Publicador |
Springer International Publishing |
Relação |
http://eprints.qut.edu.au/89750/1/WISE2015052.pdf DOI:10.1007/978-3-319-26187-4_16 Sutanto, Taufik & Nayak, Richi (2015) Semi-supervised document clustering via loci. Lecture Notes in Computer Science, 9419, pp. 208-215. |
Direitos |
Copyright 2015 Springer International Publishing Switzerland The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-26187-4_16 |
Fonte |
School of Electrical Engineering & Computer Science; Institute for Creative Industries and Innovation; Science & Engineering Faculty |
Palavras-Chave | #080109 Pattern Recognition and Data Mining #Loci #Ranking #Semi-supervised clustering |
Tipo |
Journal Article |