The ranking based constrained document clustering method and its application to social event detection


Autoria(s): Sutanto, Taufik; Nayak, Richi
Contribuinte(s)

Bhowmick, Sourav

Dyreson, Curtis

Jensen, Christian

Lee, Mong

Muliantara, Agus

Thalheim, Bernhard

Data(s)

21/04/2014

Resumo

With the growing size and variety of social media files on the web, it’s becoming critical to efficiently organize them into clusters for further processing. This paper presents a novel scalable constrained document clustering method that harnesses the power of search engines capable of dealing with large text data. Instead of calculating distance between the documents and all of the clusters’ centroids, a neighborhood of best cluster candidates is chosen using a document ranking scheme. To make the method faster and less memory dependable, the in-memory and in-database processing are combined in a semi-incremental manner. This method has been extensively tested in the social event detection application. Empirical analysis shows that the proposed method is efficient both in computation and memory usage while producing notable accuracy.

Identificador

http://eprints.qut.edu.au/71606/

Publicador

Springer International Publishing

Relação

DOI:10.1007/978-3-319-05813-9_4

Sutanto, Taufik & Nayak, Richi (2014) The ranking based constrained document clustering method and its application to social event detection. Lecture Notes in Computer Science, 8422, pp. 47-60.

Direitos

Copyright 2014 Springer International Publishing Switzerland

Fonte

School of Electrical Engineering & Computer Science; Institute for Creative Industries and Innovation; Science & Engineering Faculty

Palavras-Chave #080109 Pattern Recognition and Data Mining #080604 Database Management #080704 Information Retrieval and Web Search #constrained clustering #ranking #social event detection
Tipo

Journal Article