Overview of the INEX 2010 XML mining track : clustering and classification of XML documents


Autoria(s): De Vries, Christopher Michael; Nayak, Richi; Kutty, Sangeetha; Geva, Shlomo; Tagarelli, Andrea
Data(s)

2011

Resumo

The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition. INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/41223/

Publicador

Springer

Relação

http://eprints.qut.edu.au/41223/1/INEX_2010_XML_Mining_Overview.pdf

http://www.inex.otago.ac.nz/

De Vries, Christopher Michael, Nayak, Richi, Kutty, Sangeetha, Geva, Shlomo, & Tagarelli, Andrea (2011) Overview of the INEX 2010 XML mining track : clustering and classification of XML documents. In Lecture Notes in Computer Science, Springer, Amsterdam.

Direitos

Copyright 2010 [Please consult the authors]

Fonte

Computer Science; Faculty of Science and Technology

Palavras-Chave #080109 Pattern Recognition and Data Mining #080704 Information Retrieval and Web Search #XML document mining #INEX #Wikipedia #Structure #Content #Clustering #Classification
Tipo

Conference Paper