41 resultados para cross-language information retrieval
em University of Queensland eSpace - Australia
Resumo:
Document ranking is an important process in information retrieval (IR). It presents retrieved documents in an order of their estimated degrees of relevance to query. Traditional document ranking methods are mostly based on the similarity computations between documents and query. In this paper we argue that the similarity-based document ranking is insufficient in some cases. There are two reasons. Firstly it is about the increased information variety. There are far too many different types documents available now for user to search. The second is about the users variety. In many cases user may want to retrieve documents that are not only similar but also general or broad regarding a certain topic. This is particularly the case in some domains such as bio-medical IR. In this paper we propose a novel approach to re-rank the retrieved documents by incorporating the similarity with their generality. By an ontology-based analysis on the semantic cohesion of text, document generality can be quantified. The retrieved documents are then re-ranked by their combined scores of similarity and the closeness of documents’ generality to the query’s. Our experiments have shown an encouraging performance on a large bio-medical document collection, OHSUMED, containing 348,566 medical journal references and 101 test queries.
Resumo:
This study examined the discrimination of word-final stop contrasts (/p/-/t/, /p/-/k/, /t/-/k/) in English and Thai by 12 listeners who speak Vietnamese as their first language (L1). Vietnamese shares specific phonetic realization of stops with Thai, i.e., unreleased final stop and differs from English which allows both released and unreleased final stops. These 12 native Vietnamese (NV) listeners’ discrimination accuracy was compared to that of the two listener groups (Australian English (AE), native Thai (NT)) tested in previous studies. The NV group was less accurate than the native group in discriminating both English and Thai stop contrasts. In particular, for the Thai /t/-/k/ contrast, they were significantly less accurate than the AE listeners. The present findings suggest that experience with specific (i.e., unreleased) and native phonetic realization of sounds may be essential in accurate discrimination of final stop contrasts. The effect of L1 dialect on cross-language speech perception is discussed.
Resumo:
Domain specific information retrieval has become in demand. Not only domain experts, but also average non-expert users are interested in searching domain specific (e.g., medical and health) information from online resources. However, a typical problem to average users is that the search results are always a mixture of documents with different levels of readability. Non-expert users may want to see documents with higher readability on the top of the list. Consequently the search results need to be re-ranked in a descending order of readability. It is often not practical for domain experts to manually label the readability of documents for large databases. Computational models of readability needs to be investigated. However, traditional readability formulas are designed for general purpose text and insufficient to deal with technical materials for domain specific information retrieval. More advanced algorithms such as textual coherence model are computationally expensive for re-ranking a large number of retrieved documents. In this paper, we propose an effective and computationally tractable concept-based model of text readability. In addition to textual genres of a document, our model also takes into account domain specific knowledge, i.e., how the domain-specific concepts contained in the document affect the document’s readability. Three major readability formulas are proposed and applied to health and medical information retrieval. Experimental results show that our proposed readability formulas lead to remarkable improvements in terms of correlation with users’ readability ratings over four traditional readability measures.
Resumo:
This paper discusses an document discovery tool based on formal concept analysis. The program allows users to navigate email using a visual lattice metaphor rather than a tree. It implements a virtual file structure over email where files and entire directories can appear in multiple positions. The content and shape of the lattice formed by the conceptual ontology can assist in email discovery. The system described provides more flexibility in retrieving stored emails than what is normally available in email clients. The paper discusses how conceptual ontologies can leverage traditional document retrieval systems.
Resumo:
Interviews with Australian university students returning from study in France indicate that problems in accessing crucial information are common experiences, and frequently lead to students reproducing stereotypes of French administrative inefficiency. Our paper argues that the issue is not one of information per se but of cultural differences in the dissemination of information. It analyses the ways in which students interpret their information-gathering difficulties, and the appropriateness of the strategies they devise for overcoming them. It then examines the pedagogical implications for preparing students for study abroad, suggesting means of both equipping students with alternative ways of understanding 'information skills' and intervening in the perpetuation of stereotypes. Cet article se base sur une quarantaine d'interviews avec des étudiants australiens ayant effectué des séjours d'études en France. La difficulté d'accéder aux renseignements jugés indispensables revient souvent au cours des entretiens, source de frustrations qui amène les Australiens à reproduire un stéréotype de l'inefficacité française. Nous posons qu'il s'agit moins d'un manque d'informations que d'une différence culturelle dans la diffusion des renseignements. Notre analyse porte sur les façons dont les étudiants interprètent leurs difficultés, ainsi que sur l'utilité de leurs stratégies pour réunir les données souhaitées. Ce travail a des conséquences pédagogiques pour la préparation de tels séjours : nous suggérons des moyens de conduire les étudiants à concevoir autrement la recherche de l'information et leurs expériences, intervenant ainsi dans la transmission des stéréotypes.
Resumo:
Document classification is a supervised machine learning process, where predefined category labels are assigned to documents based on the hypothesis derived from training set of labelled documents. Documents cannot be directly interpreted by a computer system unless they have been modelled as a collection of computable features. Rogati and Yang [M. Rogati and Y. Yang, Resource selection for domain-specific cross-lingual IR, in SIGIR 2004: Proceedings of the 27th annual international conference on Research and Development in Information Retrieval, ACM Press, Sheffied: United Kingdom, pp. 154-161.] pointed out that the effectiveness of document classification system may vary in different domains. This implies that the quality of document model contributes to the effectiveness of document classification. Conventionally, model evaluation is accomplished by comparing the effectiveness scores of classifiers on model candidates. However, this kind of evaluation methods may encounter either under-fitting or over-fitting problems, because the effectiveness scores are restricted by the learning capacities of classifiers. We propose a model fitness evaluation method to determine whether a model is sufficient to distinguish positive and negative instances while still competent to provide satisfactory effectiveness with a small feature subset. Our experiments demonstrated how the fitness of models are assessed. The results of our work contribute to the researches of feature selection, dimensionality reduction and document classification.
Resumo:
The main aim of the proposed approach presented in this paper is to improve Web information retrieval effectiveness by overcoming the problems associated with a typical keyword matching retrieval system, through the use of concepts and an intelligent fusion of confidence values. By exploiting the conceptual hierarchy of the WordNet (G. Miller, 1995) knowledge base, we show how to effectively encode the conceptual information in a document using the semantic information implied by the words that appear within it. Rather than treating a word as a string made up of a sequence of characters, we consider a word to represent a concept.
Resumo:
Formal Concept Analysis is an unsupervised machine learning technique that has successfully been applied to document organisation by considering documents as objects and keywords as attributes. The basic algorithms of Formal Concept Analysis then allow an intelligent information retrieval system to cluster documents according to keyword views. This paper investigates the scalability of this idea. In particular we present the results of applying spatial data structures to large datasets in formal concept analysis. Our experiments are motivated by the application of the Formal Concept Analysis idea of a virtual filesystem [11,17,15]. In particular the libferris [1] Semantic File System. This paper presents customizations to an RD-Tree Generalized Index Search Tree based index structure to better support the application of Formal Concept Analysis to large data sources.
Resumo:
This paper reports the introduction of an evidence-based medicine fellowship in a children’s teaching hospital. The results are presented of a self-reported ‘evidence-based medicine’ questionnaire, the clinical questions requested through the information retrieval service are outlined and the results of an information retrieval service user questionnaire are reported. It was confirmed that clinicians have frequent clinical questions that mostly remain unanswered. The responses to four questions with ‘good quality’ evidence-based answers were reviewed and suggest that at least one-quarter of doctors were not aware of the current best available evidence. There was a high level of satisfaction with the information retrieval service; 19% of users indicated that the information changed their clinical practice and 73% indicated that the information confirmed their clinical practice. The introduction of an evidence-based medicine fellowship is one method of disseminating the practice of evidence-based medicine in a tertiary children’s hospital.
Resumo:
Semantic data models provide a map of the components of an information system. The characteristics of these models affect their usefulness for various tasks (e.g., information retrieval). The quality of information retrieval has obvious important consequences, both economic and otherwise. Traditionally, data base designers have produced parsimonious logical data models. In spite of their increased size, ontologically clearer conceptual models have been shown to facilitate better performance for both problem solving and information retrieval tasks in experimental settings. The experiments producing evidence of enhanced performance for ontologically clearer models have, however, used application domains of modest size. Data models in organizational settings are likely to be substantially larger than those used in these experiments. This research used an experiment to investigate whether the benefits of improved information retrieval performance associated with ontologically clearer models are robust as the size of the application domains increase. The experiment used an application domain of approximately twice the size as tested in prior experiments. The results indicate that, relative to the users of the parsimonious implementation, end users of the ontologically clearer implementation made significantly more semantic errors, took significantly more time to compose their queries, and were significantly less confident in the accuracy of their queries.
Resumo:
Even when data repositories exhibit near perfect data quality, users may formulate queries that do not correspond to the information requested. Users’ poor information retrieval performance may arise from either problems understanding of the data models that represent the real world systems, or their query skills. This research focuses on users’ understanding of the data structures, i.e., their ability to map the information request and the data model. The Bunge-Wand-Weber ontology was used to formulate three sets of hypotheses. Two laboratory experiments (one using a small data model and one using a larger data model) tested the effect of ontological clarity on users’ performance when undertaking component, record, and aggregate level tasks. The results indicate for the hypotheses associated with different representations but equivalent semantics that parsimonious data model participants performed better for component level tasks but that ontologically clearer data model participants performed better for record and aggregate level tasks.