645 resultados para Focused retrieval, Result aggregation, Metrics, Users
Resumo:
In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into two main paradigms, depending upon how the ranks of documents are revised for promoting diversity. In the first approach subtopic diversification is achieved implicitly, by choosing documents that are different from each other, while in the second approach this is done explicitly, by estimating the subtopics covered by documents. Within this context, we compare methods belonging to the two paradigms. Furthermore, we investigate possible strategies for integrating the two paradigms with the aim of formulating a new ranking method for subtopic retrieval. We conduct a number of experiments to empirically validate and contrast the state-of-the-art approaches as well as instantiations of our integration approach. The results show that the integration approach outperforms state-of-the-art strategies with respect to a number of measures.
Resumo:
Ranking documents according to the Probability Ranking Principle has been theoretically shown to guarantee optimal retrieval effectiveness in tasks such as ad hoc document retrieval. This ranking strategy assumes independence among document relevance assessments. This assumption, however, often does not hold, for example in the scenarios where redundancy in retrieved documents is of major concern, as it is the case in the sub–topic retrieval task. In this chapter, we propose a new ranking strategy for sub–topic retrieval that builds upon the interdependent document relevance and topic–oriented models. With respect to the topic– oriented model, we investigate both static and dynamic clustering techniques, aiming to group topically similar documents. Evidence from clusters is then combined with information about document dependencies to form a new document ranking. We compare and contrast the proposed method against state–of–the–art approaches, such as Maximal Marginal Relevance, Portfolio Theory for Information Retrieval, and standard cluster–based diversification strategies. The empirical investigation is performed on the ImageCLEF 2009 Photo Retrieval collection, where images are assessed with respect to sub–topics of a more general query topic. The experimental results show that our approaches outperform the state–of–the–art strategies with respect to a number of diversity measures.
Resumo:
The assumptions underlying the Probability Ranking Principle (PRP) have led to a number of alternative approaches that cater or compensate for the PRP’s limitations. All alternatives deviate from the PRP by incorporating dependencies. This results in a re-ranking that promotes or demotes documents depending upon their relationship with the documents that have been already ranked. In this paper, we compare and contrast the behaviour of state-of-the-art ranking strategies and principles. To do so, we tease out analytical relationships between the ranking approaches and we investigate the document kinematics to visualise the effects of the different approaches on document ranking.
Resumo:
Quantum-inspired models have recently attracted increasing attention in Information Retrieval. An intriguing characteristic of the mathematical framework of quantum theory is the presence of complex numbers. However, it is unclear what such numbers could or would actually represent or mean in Information Retrieval. The goal of this paper is to discuss the role of complex numbers within the context of Information Retrieval. First, we introduce how complex numbers are used in quantum probability theory. Then, we examine van Rijsbergen’s proposal of evoking complex valued representations of informations objects. We empirically show that such a representation is unlikely to be effective in practice (confuting its usefulness in Information Retrieval). We then explore alternative proposals which may be more successful at realising the power of complex numbers.
Resumo:
For TREC Crowdsourcing 2011 (Stage 2) we propose a networkbased approach for assigning an indicative measure of worker trustworthiness in crowdsourced labelling tasks. Workers, the gold standard and worker/gold standard agreements are modelled as a network. For the purpose of worker trustworthiness assignment, a variant of the PageRank algorithm, named TurkRank, is used to adaptively combine evidence that suggests worker trustworthiness, i.e., agreement with other trustworthy co-workers and agreement with the gold standard. A single parameter controls the importance of co-worker agreement versus gold standard agreement. The TurkRank score calculated for each worker is incorporated with a worker-weighted mean label aggregation.
Resumo:
The presence of spam in a document ranking is a major issue for Web search engines. Common approaches that cope with spam remove from the document rankings those pages that are likely to contain spam. These approaches are implemented as post-retrieval processes, that filter out spam pages only after documents have been retrieved with respect to a user’s query. In this paper we suggest to remove spam pages at indexing time, therefore obtaining a pruned index that is virtually “spam-free”. We investigate the benefits of this approach from three points of view: indexing time, index size, and retrieval performances. Not surprisingly, we found that the strategy decreases both the time required by the indexing process and the space required for storing the index. Surprisingly instead, we found that by considering a spam-pruned version of a collection’s index, no difference in retrieval performance is found when compared to that obtained by traditional post-retrieval spam filtering approaches.
Resumo:
In this paper we define two models of users that require diversity in search results; these models are theoretically grounded in the notion of intrinsic and extrinsic diversity. We then examine Intent-Aware Expected Reciprocal Rank (ERR-IA), one of the official measures used to assess diversity in TREC 2011-12, with respect to the proposed user models. By analyzing ranking preferences as expressed by the user models and those estimated by ERR-IA, we investigate whether ERR-IA assesses document rankings according to the requirements of the diversity retrieval task expressed by the two models. Empirical results demonstrate that ERR-IA neglects query-intents coverage by attributing excessive importance to redundant relevant documents. ERR-IA behavior is contrary to the user models that require measures to first assess diversity through the coverage of intents, and then assess the redundancy of relevant intents. Furthermore, diversity should be considered separately from document relevance and the documents positions in the ranking.
Creation of a new evaluation benchmark for information retrieval targeting patient information needs
Resumo:
Searching for health advice on the web is becoming increasingly common. Because of the great importance of this activity for patients and clinicians and the effect that incorrect information may have on health outcomes, it is critical to present relevant and valuable information to a searcher. Previous evaluation campaigns on health information retrieval (IR) have provided benchmarks that have been widely used to improve health IR and record these improvements. However, in general these benchmarks have targeted the specialised information needs of physicians and other healthcare workers. In this paper, we describe the development of a new collection for evaluation of effectiveness in IR seeking to satisfy the health information needs of patients. Our methodology features a novel way to create statements of patients’ information needs using realistic short queries associated with patient discharge summaries, which provide details of patient disorders. We adopt a scenario where the patient then creates a query to seek information relating to these disorders. Thus, discharge summaries provide us with a means to create contextually driven search statements, since they may include details on the stage of the disease, family history etc. The collection will be used for the first time as part of the ShARe/-CLEF 2013 eHealth Evaluation Lab, which focuses on natural language processing and IR for clinical care.
Resumo:
Complex numbers are a fundamental aspect of the mathematical formalism of quantum physics. Quantum-like models developed outside physics often overlooked the role of complex numbers. Specifically, previous models in Information Retrieval (IR) ignored complex numbers. We argue that to advance the use of quantum models of IR, one has to lift the constraint of real-valued representations of the information space, and package more information within the representation by means of complex numbers. As a first attempt, we propose a complex-valued representation for IR, which explicitly uses complex valued Hilbert spaces, and thus where terms, documents and queries are represented as complex-valued vectors. The proposal consists of integrating distributional semantics evidence within the real component of a term vector; whereas, ontological information is encoded in the imaginary component. Our proposal has the merit of lifting the role of complex numbers from a computational byproduct of the model to the very mathematical texture that unifies different levels of semantic information. An empirical instantiation of our proposal is tested in the TREC Medical Record task of retrieving cohorts for clinical studies.
Resumo:
This paper presents the results of task 3 of the ShARe/CLEF eHealth Evaluation Lab 2013. This evaluation lab focuses on improving access to medical information on the web. The task objective was to investigate the effect of using additional information such as the discharge summaries and external resources such as medical ontologies on the IR effectiveness. The participants were allowed to submit up to seven runs, one mandatory run using no additional information or external resources, and three each using or not using discharge summaries.
Resumo:
Twitter is the focus of much research attention, both in traditional academic circles and in commercial market and media research, as analytics give increasing insight into the performance of the platform in areas as diverse as political communication, crisis management, television audiencing and other industries. While methods for tracking Twitter keywords and hashtags have developed apace and are well documented, the make-up of the Twitter user base and its evolution over time have been less understood to date. Recent research efforts have taken advantage of functionality provided by Twitter's Application Programming Interface to develop methodologies to extract information that allows us to understand the growth of Twitter, its geographic spread and the processes by which particular Twitter users have attracted followers. From politicians to sporting teams, and from YouTube personalities to reality television stars, this technique enables us to gain an understanding of what prompts users to follow others on Twitter. This article outlines how we came upon this approach, describes the method we adopted to produce accession graphs and discusses their use in Twitter research. It also addresses the wider ethical implications of social network analytics, particularly in the context of a detailed study of the Twitter user base.
Resumo:
Most previous work on unconditionally secure multiparty computation has focused on computing over a finite field (or ring). Multiparty computation over other algebraic structures has not received much attention, but is an interesting topic whose study may provide new and improved tools for certain applications. At CRYPTO 2007, Desmedt et al introduced a construction for a passive-secure multiparty multiplication protocol for black-box groups, reducing it to a certain graph coloring problem, leaving as an open problem to achieve security against active attacks. We present the first n-party protocol for unconditionally secure multiparty computation over a black-box group which is secure under an active attack model, tolerating any adversary structure Δ satisfying the Q 3 property (in which no union of three subsets from Δ covers the whole player set), which is known to be necessary for achieving security in the active setting. Our protocol uses Maurer’s Verifiable Secret Sharing (VSS) but preserves the essential simplicity of the graph-based approach of Desmedt et al, which avoids each shareholder having to rerun the full VSS protocol after each local computation. A corollary of our result is a new active-secure protocol for general multiparty computation of an arbitrary Boolean circuit.
Resumo:
This paper reports on a four year Australian Research Council funded Linkage Project titled Skilling Indigenous Queensland, conducted in regional areas of Queensland, Australia from 2009 to 2013. The project sought to investigate Vocational Education and Training (VET) and teaching, Indigenous learners’ needs, employer culture and expectations and community culture and expectations to identify best practice in numeracy teaching for Indigenous VET learners. Specifically it focused on ways to enhance the teaching and learning of courses and the associated mathematics in such courses to benefit learners and increase their future opportunities of employment. To date thirty - nine teachers/trainers/teacher aides and two hundred and thirty - one students consented to participate in the project. Nine VET courses offered in schools and Technical and Further Education Institutes (TAFE) were nominated to be the focus on the study. This paper focuses on student questionnaire responses and interview responses from teachers/trainers one high school principal and five students as a result of these processes, the findings indicated that VET course teachers work hard to adopt contextualising strategies to their teaching; however this process is not always straight forward because of the perceptions of how mathematics has been taught and learned by trainers and teachers. Further teachers, trainers and students have high expectations of one another with the view to successful outcomes from the courses.
Resumo:
Within Human-Computer Interaction (HCI) and Computer Supported Cooperative Work (CSCW) research, the notion of technologically-mediated awareness is often used for allowing relevant people to maintain a mental model of activities, behaviors and status information about each other so that they can organize and coordinate work or other joint activities. The initial conceptions of awareness focused largely on improving productivity and efficiency within work environments. With new social, cultural and commercial needs and the emergence of novel computing technologies, the focus of technologically-mediated awareness has extended from work environments to people’s everyday interactions. Hence, the scope of awareness has extended from conveying work related activities to people’s emotions, love, social status and other broad range of aspects. This trend of conceptualizing HCI design is termed as experience-focused HCI. In my PhD dissertation, designing for awareness, I have reported on how we, as HCI researchers, can design awareness systems from experience-focused HCI perspective that follow the trend of conveying awareness beyond the task-based, instrumental and productive needs. Within the overall aim to design for awareness, my research advocates ethnomethodologically-informed approaches for conceptualizing and designing for awareness. In this sense, awareness is not a predefined phenomenon but something that is situated and particular to a given environment. I have used this approach in two design cases of developing interactive systems that support awareness beyond task-based aspects in work environments. In both the cases, I have followed a complete design cycle: collecting an in-situ understanding of an environment, developing implications for a new technology, implementing a prototype technology to studying the use of the technology in its natural settings.