163 resultados para Guido, Tomás


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The assumptions underlying the Probability Ranking Principle (PRP) have led to a number of alternative approaches that cater or compensate for the PRP’s limitations. All alternatives deviate from the PRP by incorporating dependencies. This results in a re-ranking that promotes or demotes documents depending upon their relationship with the documents that have been already ranked. In this paper, we compare and contrast the behaviour of state-of-the-art ranking strategies and principles. To do so, we tease out analytical relationships between the ranking approaches and we investigate the document kinematics to visualise the effects of the different approaches on document ranking.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The aim of this paper is to investigate the role of emotion features in diversifying document rankings to improve the effectiveness of Information Retrieval (IR) systems. For this purpose, two approaches are proposed to consider emotion features for diversification, and they are empirically tested on the TREC 678 Interactive Track collection. The results show that emotion features are capable of enhancing retrieval effectiveness.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recently, Portfolio Theory (PT) has been proposed for Information Retrieval. However, under non-trivial conditions PT violates the original Probability Ranking Principle (PRP). In this poster, we shall explore whether PT upholds a different ranking principle based on Quantum Theory, i.e. the Quantum Probability Ranking Principle (QPRP), and examine the relationship between this new model and the new ranking principle. We make a significant contribution to the theoretical development of PT and show that under certain circumstances PT upholds the QPRP, and thus guarantees an optimal ranking according to the QPRP. A practical implication of this finding is that the parameters of PT can be automatically estimated via the QPRP, instead of resorting to extensive parameter tuning.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Quantum-inspired models have recently attracted increasing attention in Information Retrieval. An intriguing characteristic of the mathematical framework of quantum theory is the presence of complex numbers. However, it is unclear what such numbers could or would actually represent or mean in Information Retrieval. The goal of this paper is to discuss the role of complex numbers within the context of Information Retrieval. First, we introduce how complex numbers are used in quantum probability theory. Then, we examine van Rijsbergen’s proposal of evoking complex valued representations of informations objects. We empirically show that such a representation is unlikely to be effective in practice (confuting its usefulness in Information Retrieval). We then explore alternative proposals which may be more successful at realising the power of complex numbers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For TREC Crowdsourcing 2011 (Stage 2) we propose a networkbased approach for assigning an indicative measure of worker trustworthiness in crowdsourced labelling tasks. Workers, the gold standard and worker/gold standard agreements are modelled as a network. For the purpose of worker trustworthiness assignment, a variant of the PageRank algorithm, named TurkRank, is used to adaptively combine evidence that suggests worker trustworthiness, i.e., agreement with other trustworthy co-workers and agreement with the gold standard. A single parameter controls the importance of co-worker agreement versus gold standard agreement. The TurkRank score calculated for each worker is incorporated with a worker-weighted mean label aggregation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The presence of spam in a document ranking is a major issue for Web search engines. Common approaches that cope with spam remove from the document rankings those pages that are likely to contain spam. These approaches are implemented as post-retrieval processes, that filter out spam pages only after documents have been retrieved with respect to a user’s query. In this paper we suggest to remove spam pages at indexing time, therefore obtaining a pruned index that is virtually “spam-free”. We investigate the benefits of this approach from three points of view: indexing time, index size, and retrieval performances. Not surprisingly, we found that the strategy decreases both the time required by the indexing process and the space required for storing the index. Surprisingly instead, we found that by considering a spam-pruned version of a collection’s index, no difference in retrieval performance is found when compared to that obtained by traditional post-retrieval spam filtering approaches.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this thesis we investigate the use of quantum probability theory for ranking documents. Quantum probability theory is used to estimate the probability of relevance of a document given a user's query. We posit that quantum probability theory can lead to a better estimation of the probability of a document being relevant to a user's query than the common approach, i. e. the Probability Ranking Principle (PRP), which is based upon Kolmogorovian probability theory. Following our hypothesis, we formulate an analogy between the document retrieval scenario and a physical scenario, that of the double slit experiment. Through the analogy, we propose a novel ranking approach, the quantum probability ranking principle (qPRP). Key to our proposal is the presence of quantum interference. Mathematically, this is the statistical deviation between empirical observations and expected values predicted by the Kolmogorovian rule of additivity of probabilities of disjoint events in configurations such that of the double slit experiment. We propose an interpretation of quantum interference in the document ranking scenario, and examine how quantum interference can be effectively estimated for document retrieval. To validate our proposal and to gain more insights about approaches for document ranking, we (1) analyse PRP, qPRP and other ranking approaches, exposing the assumptions underlying their ranking criteria and formulating the conditions for the optimality of the two ranking principles, (2) empirically compare three ranking principles (i. e. PRP, interactive PRP, and qPRP) and two state-of-the-art ranking strategies in two retrieval scenarios, those of ad-hoc retrieval and diversity retrieval, (3) analytically contrast the ranking criteria of the examined approaches, exposing similarities and differences, (4) study the ranking behaviours of approaches alternative to PRP in terms of the kinematics they impose on relevant documents, i. e. by considering the extent and direction of the movements of relevant documents across the ranking recorded when comparing PRP against its alternatives. Our findings show that the effectiveness of the examined ranking approaches strongly depends upon the evaluation context. In the traditional evaluation context of ad-hoc retrieval, PRP is empirically shown to be better or comparable to alternative ranking approaches. However, when we turn to examine evaluation contexts that account for interdependent document relevance (i. e. when the relevance of a document is assessed also with respect to other retrieved documents, as it is the case in the diversity retrieval scenario) then the use of quantum probability theory and thus of qPRP is shown to improve retrieval and ranking effectiveness over the traditional PRP and alternative ranking strategies, such as Maximal Marginal Relevance, Portfolio theory, and Interactive PRP. This work represents a significant step forward regarding the use of quantum theory in information retrieval. It demonstrates in fact that the application of quantum theory to problems within information retrieval can lead to improvements both in modelling power and retrieval effectiveness, allowing the constructions of models that capture the complexity of information retrieval situations. Furthermore, the thesis opens up a number of lines for future research. These include: (1) investigating estimations and approximations of quantum interference in qPRP; (2) exploiting complex numbers for the representation of documents and queries, and; (3) applying the concepts underlying qPRP to tasks other than document ranking.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objective To evaluate the effects of Optical Character Recognition (OCR) on the automatic cancer classification of pathology reports. Method Scanned images of pathology reports were converted to electronic free-text using a commercial OCR system. A state-of-the-art cancer classification system, the Medical Text Extraction (MEDTEX) system, was used to automatically classify the OCR reports. Classifications produced by MEDTEX on the OCR versions of the reports were compared with the classification from a human amended version of the OCR reports. Results The employed OCR system was found to recognise scanned pathology reports with up to 99.12% character accuracy and up to 98.95% word accuracy. Errors in the OCR processing were found to minimally impact on the automatic classification of scanned pathology reports into notifiable groups. However, the impact of OCR errors is not negligible when considering the extraction of cancer notification items, such as primary site, histological type, etc. Conclusions The automatic cancer classification system used in this work, MEDTEX, has proven to be robust to errors produced by the acquisition of freetext pathology reports from scanned images through OCR software. However, issues emerge when considering the extraction of cancer notification items.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objective: To develop a system for the automatic classification of pathology reports for Cancer Registry notifications. Method: A two pass approach is proposed to classify whether pathology reports are cancer notifiable or not. The first pass queries pathology HL7 messages for known report types that are received by the Queensland Cancer Registry (QCR), while the second pass aims to analyse the free text reports and identify those that are cancer notifiable. Cancer Registry business rules, natural language processing and symbolic reasoning using the SNOMED CT ontology were adopted in the system. Results: The system was developed on a corpus of 500 histology and cytology reports (with 47% notifiable reports) and evaluated on an independent set of 479 reports (with 52% notifiable reports). Results show that the system can reliably classify cancer notifiable reports with a sensitivity, specificity, and positive predicted value (PPV) of 0.99, 0.95, and 0.95, respectively for the development set, and 0.98, 0.96, and 0.96 for the evaluation set. High sensitivity can be achieved at a slight expense in specificity and PPV. Conclusion: The system demonstrates how medical free-text processing enables the classification of cancer notifiable pathology reports with high reliability for potential use by Cancer Registries and pathology laboratories.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The aim of this research is to report initial experimental results and evaluation of a clinician-driven automated method that can address the issue of misdiagnosis from unstructured radiology reports. Timely diagnosis and reporting of patient symptoms in hospital emergency departments (ED) is a critical component of health services delivery. However, due to disperse information resources and vast amounts of manual processing of unstructured information, a point-of-care accurate diagnosis is often difficult. A rule-based method that considers the occurrence of clinician specified keywords related to radiological findings was developed to identify limb abnormalities, such as fractures. A dataset containing 99 narrative reports of radiological findings was sourced from a tertiary hospital. The rule-based method achieved an F-measure of 0.80 and an accuracy of 0.80. While our method achieves promising performance, a number of avenues for improvement were identified using advanced natural language processing (NLP) techniques.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we define two models of users that require diversity in search results; these models are theoretically grounded in the notion of intrinsic and extrinsic diversity. We then examine Intent-Aware Expected Reciprocal Rank (ERR-IA), one of the official measures used to assess diversity in TREC 2011-12, with respect to the proposed user models. By analyzing ranking preferences as expressed by the user models and those estimated by ERR-IA, we investigate whether ERR-IA assesses document rankings according to the requirements of the diversity retrieval task expressed by the two models. Empirical results demonstrate that ERR-IA neglects query-intents coverage by attributing excessive importance to redundant relevant documents. ERR-IA behavior is contrary to the user models that require measures to first assess diversity through the coverage of intents, and then assess the redundancy of relevant intents. Furthermore, diversity should be considered separately from document relevance and the documents positions in the ranking.