61 resultados para XML, Information, Retrieval, Query, Language

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Document ranking is an important process in information retrieval (IR). It presents retrieved documents in an order of their estimated degrees of relevance to query. Traditional document ranking methods are mostly based on the similarity computations between documents and query. In this paper we argue that the similarity-based document ranking is insufficient in some cases. There are two reasons. Firstly it is about the increased information variety. There are far too many different types documents available now for user to search. The second is about the users variety. In many cases user may want to retrieve documents that are not only similar but also general or broad regarding a certain topic. This is particularly the case in some domains such as bio-medical IR. In this paper we propose a novel approach to re-rank the retrieved documents by incorporating the similarity with their generality. By an ontology-based analysis on the semantic cohesion of text, document generality can be quantified. The retrieved documents are then re-ranked by their combined scores of similarity and the closeness of documents’ generality to the query’s. Our experiments have shown an encouraging performance on a large bio-medical document collection, OHSUMED, containing 348,566 medical journal references and 101 test queries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Even when data repositories exhibit near perfect data quality, users may formulate queries that do not correspond to the information requested. Users’ poor information retrieval performance may arise from either problems understanding of the data models that represent the real world systems, or their query skills. This research focuses on users’ understanding of the data structures, i.e., their ability to map the information request and the data model. The Bunge-Wand-Weber ontology was used to formulate three sets of hypotheses. Two laboratory experiments (one using a small data model and one using a larger data model) tested the effect of ontological clarity on users’ performance when undertaking component, record, and aggregate level tasks. The results indicate for the hypotheses associated with different representations but equivalent semantics that parsimonious data model participants performed better for component level tasks but that ontologically clearer data model participants performed better for record and aggregate level tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Domain specific information retrieval has become in demand. Not only domain experts, but also average non-expert users are interested in searching domain specific (e.g., medical and health) information from online resources. However, a typical problem to average users is that the search results are always a mixture of documents with different levels of readability. Non-expert users may want to see documents with higher readability on the top of the list. Consequently the search results need to be re-ranked in a descending order of readability. It is often not practical for domain experts to manually label the readability of documents for large databases. Computational models of readability needs to be investigated. However, traditional readability formulas are designed for general purpose text and insufficient to deal with technical materials for domain specific information retrieval. More advanced algorithms such as textual coherence model are computationally expensive for re-ranking a large number of retrieved documents. In this paper, we propose an effective and computationally tractable concept-based model of text readability. In addition to textual genres of a document, our model also takes into account domain specific knowledge, i.e., how the domain-specific concepts contained in the document affect the document’s readability. Three major readability formulas are proposed and applied to health and medical information retrieval. Experimental results show that our proposed readability formulas lead to remarkable improvements in terms of correlation with users’ readability ratings over four traditional readability measures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses an document discovery tool based on formal concept analysis. The program allows users to navigate email using a visual lattice metaphor rather than a tree. It implements a virtual file structure over email where files and entire directories can appear in multiple positions. The content and shape of the lattice formed by the conceptual ontology can assist in email discovery. The system described provides more flexibility in retrieving stored emails than what is normally available in email clients. The paper discusses how conceptual ontologies can leverage traditional document retrieval systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper examines the effects of information request ambiguity and construct incongruence on end user's ability to develop SQL queries with an interactive relational database query language. In this experiment, ambiguity in information requests adversely affected accuracy and efficiency. Incongruities among the information request, the query syntax, and the data representation adversely affected accuracy, efficiency, and confidence. The results for ambiguity suggest that organizations might elicit better query development if end users were sensitized to the nature of ambiguities that could arise in their business contexts. End users could translate natural language queries into pseudo-SQL that could be examined for precision before the queries were developed. The results for incongruence suggest that better query development might ensue if semantic distances could be reduced by giving users data representations and database views that maximize construct congruence for the kinds of queries in typical domains. (C) 2001 Elsevier Science B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many emerging applications benefit from the extraction of geospatial data specified at different resolutions for viewing purposes. Data must also be topologically accurate and up-to-date as it often represents real-world changing phenomena. Current multiresolution schemes use complex opaque data types, which limit the capacity for in-database object manipulation. By using z-values and B+trees to support multiresolution retrieval, objects are fragmented in such a way that updates to objects or object parts are executed using standard SQL (Structured Query Language) statements as opposed to procedural functions. Our approach is compared to a current model, using complex data types indexed under a 3D (three-dimensional) R-tree, and shows better performance for retrieval over realistic window sizes and data loads. Updates with the R-tree are slower and preclude the feasibility of its use in time-critical applications whereas, predictably, projecting the issue to a one-dimensional index allows constant updates using z-values to be implemented more efficiently.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Semantic data models provide a map of the components of an information system. The characteristics of these models affect their usefulness for various tasks (e.g., information retrieval). The quality of information retrieval has obvious important consequences, both economic and otherwise. Traditionally, data base designers have produced parsimonious logical data models. In spite of their increased size, ontologically clearer conceptual models have been shown to facilitate better performance for both problem solving and information retrieval tasks in experimental settings. The experiments producing evidence of enhanced performance for ontologically clearer models have, however, used application domains of modest size. Data models in organizational settings are likely to be substantially larger than those used in these experiments. This research used an experiment to investigate whether the benefits of improved information retrieval performance associated with ontologically clearer models are robust as the size of the application domains increase. The experiment used an application domain of approximately twice the size as tested in prior experiments. The results indicate that, relative to the users of the parsimonious implementation, end users of the ontologically clearer implementation made significantly more semantic errors, took significantly more time to compose their queries, and were significantly less confident in the accuracy of their queries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In 1996, I worked with what appear to have been the last fluent speakers of Ngarnka, a language of central northern Australia. To the best of my knowledge, the last fluent speaker passed away in 1997 or 1998. In 2000, I began to collect all available information on the language. This article describes some of the challenges that have arisen in working with a language during and after the final stages of its death, and examines some of the possible reasons for, and impact of, this kind of work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Leximancer system is a relatively new method for transforming lexical co-occurrence information from natural language into semantic patterns in an unsupervised manner. It employs two stages of co-occurrence information extraction-semantic and relational-using a different algorithm for each stage. The algorithms used are statistical, but they employ nonlinear dynamics and machine learning. This article is an attempt to validate the output of Leximancer, using a set of evaluation criteria taken from content analysis that are appropriate for knowledge discovery tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

SQL (Structured Query Language) is one of the essential topics in foundation databases courses in higher education. Due to its apparent simple syntax, learning to use the full power of SQL can be a very difficult activity. In this paper, we introduce SQLator, which is a web-based interactive tool for learning SQL. SQLator's key function is the evaluate function, which allows a user to evaluate the correctness of his/her query formulation. The evaluate engine is based on complex heuristic algorithms. The tool also provides instructors the facility to create and populate database schemas with an associated pool of SQL queries. Currently it hosts two databases with a query pool of 300+ across the two databases. The pool is divided into 3 categories according to query complexity. The SQLator user can perform unlimited executions and evaluations on query formulations and/or view the solutions. The SQLator evaluate function has a high rate of success in evaluating the user's statement as correct (or incorrect) corresponding to the question. We will present in this paper, the basic architecture and functions of SQLator. We will further discuss the value of SQLator as an educational technology and report on educational outcomes based on studies conducted at the School of Information Technology and Electrical Engineering, The University of Queensland.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In multimedia retrieval, a query is typically interactively refined towards the ‘optimal’ answers by exploiting user feedback. However, in existing work, in each iteration, the refined query is re-evaluated. This is not only inefficient but fails to exploit the answers that may be common between iterations. In this paper, we introduce a new approach called SaveRF (Save random accesses in Relevance Feedback) for iterative relevance feedback search. SaveRF predicts the potential candidates for the next iteration and maintains this small set for efficient sequential scan. By doing so, repeated candidate accesses can be saved, hence reducing the number of random accesses. In addition, efficient scan on the overlap before the search starts also tightens the search space with smaller pruning radius. We implemented SaveRF and our experimental study on real life data sets show that it can reduce the I/O cost significantly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Meta-Object Facility (MOF) provides a standardized framework for object-oriented models. An instance of a MOF model contains objects and links whose interfaces are entirely derived from that model. Information contained in these objects can be accessed directly, however, in order to realize the Model-Driven Architecture@trade; (MDA), we must have a mechanism for representing and evaluating structured queries on these instances. The MOF Query Language (MQL) is a language that extends the UML's Object Constraint Language (OCL) to provide more expressive power, such as higher-order queries, parametric polymorphism and argument polymorphism. Not only do these features allow more powerful queries, but they also encourage a greater degree of modularization and re-use, resulting in faster prototyping and facilitating automated integrity analysis. This paper presents an overview of the motivations for developing MQL and also discusses its abstract syntax, presented as a MOF model, and its semantics