K-tree : large scale document clustering


Autoria(s): De Vries, Christopher Michael; Geva, Shlomo
Data(s)

24/07/2009

Resumo

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/26491/

Publicador

ACM

Relação

http://eprints.qut.edu.au/26491/2/26491.pdf

DOI:10.1145/1571941.1572094

De Vries, Christopher Michael & Geva, Shlomo (2009) K-tree : large scale document clustering. In 32nd international ACM SIGIR Conference on Research and Development in Information Retrieval, 19-23 July 2009, Boston, MA.

Direitos

Copyright 2009 The Authors

(c) 2009 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Fonte

Faculty of Science and Technology

Palavras-Chave #080109 Pattern Recognition and Data Mining #080201 Analysis of Algorithms and Complexity #080704 Information Retrieval and Web Search #K-tree #k-means #clustering #document clustering #search tree #performance #algorithm
Tipo

Conference Paper