Overview of the INEX 2010 XML mining track : clustering and classification of XML documents
Data(s) |
2011
|
---|---|
Resumo |
The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition. INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks. |
Formato |
application/pdf |
Identificador | |
Publicador |
Springer |
Relação |
http://eprints.qut.edu.au/41223/1/INEX_2010_XML_Mining_Overview.pdf http://www.inex.otago.ac.nz/ De Vries, Christopher Michael, Nayak, Richi, Kutty, Sangeetha, Geva, Shlomo, & Tagarelli, Andrea (2011) Overview of the INEX 2010 XML mining track : clustering and classification of XML documents. In Lecture Notes in Computer Science, Springer, Amsterdam. |
Direitos |
Copyright 2010 [Please consult the authors] |
Fonte |
Computer Science; Faculty of Science and Technology |
Palavras-Chave | #080109 Pattern Recognition and Data Mining #080704 Information Retrieval and Web Search #XML document mining #INEX #Wikipedia #Structure #Content #Clustering #Classification |
Tipo |
Conference Paper |