Utilising semantic tags in XML clustering


Autoria(s): Kutty, Sangeetha; Nayak, Richi; Li, Yuefeng
Data(s)

20/06/2010

Resumo

This paper presents an overview of the experiments conducted using Hybrid Clustering of XML documents using Constraints (HCXC) method for the clustering task in the INEX 2009 XML Mining track. This technique utilises frequent subtrees generated from the structure to extract the content for clustering the XML documents. It also presents the experimental study using several data representations such as the structure-only, content-only and using both the structure and the content of XML documents for the purpose of clustering them. Unlike previous years, this year the XML documents were marked up using the Wiki tags and contains categories derived by using the YAGO ontology. This paper also presents the results of studying the effect of these tags on XML clustering using the HCXC method.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/40644/

Publicador

Springerlink

Relação

http://eprints.qut.edu.au/40644/1/c40644.pdf

DOI:10.1007/978-3-642-14556-8_41

Kutty, Sangeetha, Nayak, Richi, & Li, Yuefeng (2010) Utilising semantic tags in XML clustering. In Focused Retrieval and Evaluation : Proceedings of 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, Springerlink, Brisbane, Queensland, pp. 1167-1173.

Direitos

Copyright 2010 Springerlink

This is the author-version of the work. Conference proceedings published, by Springer Verlag, will be available via SpringerLink. http://www.springerlink.com

Fonte

Computer Science; Faculty of Science and Technology

Palavras-Chave #080109 Pattern Recognition and Data Mining #XML documents #Clustering #INEX #Structure and Content #Semantic
Tipo

Conference Paper