36 resultados para INEX


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditional information retrieval (IR) systems respond to user queries with ranked lists of relevant documents. The separation of content and structure in XML documents allows individual XML elements to be selected in isolation. Thus, users expect XML-IR systems to return highly relevant results that are more precise than entire documents. In this paper we describe the implementation of a search engine for XML document collections. The system is keyword based and is built upon an XML inverted file system. We describe the approach that was adopted to meet the requirements of Content Only (CO) and Vague Content and Structure (VCAS) queries in INEX 2004.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report explains the objectives, datasets and evaluation criteria of both the clustering and classification tasks set in the INEX 2009 XML Mining track. The report also describes the approaches and results obtained by the different participants.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The INEX 2010 Focused Relevance Feedback track offered a refined approach to the evaluation of Focused Relevance Feedback algorithms through simulated exhaustive user feedback. As in traditional approaches we simulated a user-in-the loop by re-using the assessments of ad-hoc retrieval obtained from real users who assess focused ad-hoc retrieval submissions. The evaluation was extended in several ways: the use of exhaustive relevance feedback over entire runs; the evaluation of focused retrieval where both the retrieval results and the feedback are focused; the evaluation was performed over a closed set of documents and complete focused assessments; the evaluation was performed over executable implementations of relevance feedback algorithms; and �finally, the entire evaluation platform is reusable. We present the evaluation methodology, its implementation, and experimental results obtained for nine submissions from three participating organisations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition. INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the third year of the Link the Wiki track, the focus has been shifted to anchor-to-bep link discovery. The participants were encouraged to utilize different technologies to resolve the issue of focused link discovery. Apart from the 2009 Wikipedia collection, the Te Ara collection was introduced for the first time in INEX. For the link the wiki tasks, 5000 file-to-file topics were randomly selected and 33 anchor-to-bep topics were nominated by the participants. The Te Ara collection does not contain hyperlinks and the task was to cross link the entire collection. A GUI tool for self-verification of the linking results was distributed. This helps participants verify the location of the anchor and bep. The assessment tool and the evaluation tool were revised to improve efficiency. Submission runs were evaluated against Wikipedia ground-truth and manual result set respectively. Focus-based evaluation was undertaken using a new metric. Evaluation results are presented and link discovery approaches are described

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper gives an overview of the INEX 2009 Ad Hoc Track. The main goals of the Ad Hoc Track were three-fold. The first goal was to investigate the impact of the collection scale and markup, by using a new collection that is again based on a the Wikipedia but is over 4 times larger, with longer articles and additional semantic annotations. For this reason the Ad Hoc track tasks stayed unchanged, and the Thorough Task of INEX 2002–2006 returns. The second goal was to study the impact of more verbose queries on retrieval effectiveness, by using the available markup as structural constraints—now using both the Wikipedia’s layout-based markup, as well as the enriched semantic markup—and by the use of phrases. The third goal was to compare different result granularities by allowing systems to retrieve XML elements, ranges of XML elements, or arbitrary passages of text. This investigates the value of the internal document structure (as provided by the XML mark-up) for retrieving relevant information. The INEX 2009 Ad Hoc Track featured four tasks: For the Thorough Task a ranked-list of results (elements or passages) by estimated relevance was needed. For the Focused Task a ranked-list of non-overlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the setup of the track, and the results for the four tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Link the Wiki track at INEX 2008 offered two tasks, file-to-file link discovery and anchor-to-BEP link discovery. In the former 6600 topics were used and in the latter 50 were used. Manual assessment of the anchor-to-BEP runs was performed using a tool developed for the purpose. Runs were evaluated using standard precision & recall measures such as MAP and precision / recall graphs. 10 groups participated and the approaches they took are discussed. Final evaluation results for all runs are presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2011 evaluation campaign, which consisted of a five active tracks: Books and Social Search, Data Centric, Question Answering, Relevance Feedback, and Snippet Retrieval. INEX 2011 saw a range of new tasks and tracks, such as Social Book Search, Faceted Search, Snippet Retrieval, and Tweet Contextualization.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The INEX 2011 Relevance Feedback track offered a refined approach to the evaluation of Focused Relevance Feedback algorithms through simulated exhaustive user feedback. Run in largely identical fashion to the Relevance Feedback track in INEX 2010[2], we simulated a user-in-the loop by re-using the assessments of ad-hoc retrieval obtained from real users who assess focused ad-hoc retrieval submissions. We present the evaluation methodology, its implementation, and experimental results obtained for four submissions from two participating organisations. As the task and evaluation methods did not change between INEX 2010 and now, explanations of these details from the INEX 2010 version of the track have been repeated verbatim where appropriate.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper gives an overview of the INEX 2011 Snippet Retrieval Track. The goal of the Snippet Retrieval Track is to provide a common forum for the evaluation of the effectiveness of snippets, and to investigate how best to generate snippets for search results, which should provide the user with sufficient information to determine whether the underlying document is relevant. We discuss the setup of the track, and the evaluation results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The INEX workshop is concerned with evaluating the effectiveness of XML retrieval systems. In 2004 a natural language query task was added to the INEX Ad hoc track. Standard INEX Ad hoc topic titles are specified in NEXI -- a simplified and restricted subset of XPath, with a similar feel, and yet with a distinct IR flavour and interpretation. The syntax of NEXI is rigid and it imposes some limitations on the kind of information need that it can faithfully capture. At INEX 2004 the NLP question to be answered was simple -- is it practical to use a natural language query that is the equivalent of the formal NEXI title? The results of this experiment are reported and some information on the future direction of the NLP task is presented.