XML documents clustering using Tensor Space Model


Autoria(s): Kutty, Sangeetha; Nayak, Richi; Li, Yuefeng
Data(s)

28/05/2011

Resumo

The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/41717/

Publicador

Springer

Relação

http://eprints.qut.edu.au/41717/1/PAKDD.pdf

http://pakdd2011.pakdd.org/

Kutty, Sangeetha, Nayak, Richi, & Li, Yuefeng (2011) XML documents clustering using Tensor Space Model. In Proceedings of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, InterContinental Shenzhen, Shenzhen.

Direitos

Copyright 2011 Springer

The original publication is available at SpringerLink http://www.springerlink.com

Fonte

Computer Science; Faculty of Science and Technology

Palavras-Chave #080600 INFORMATION SYSTEMS #Tensor #XML #Clustering #Decomposition #Wikipedia
Tipo

Conference Paper