515 resultados para NUDIST (Information retrieval system)
Resumo:
In the field of information retrieval (IR), researchers and practitioners are often faced with a demand for valid approaches to evaluate the performance of retrieval systems. The Cranfield experiment paradigm has been dominant for the in-vitro evaluation of IR systems. Alternative to this paradigm, laboratory-based user studies have been widely used to evaluate interactive information retrieval (IIR) systems, and at the same time investigate users’ information searching behaviours. Major drawbacks of laboratory-based user studies for evaluating IIR systems include the high monetary and temporal costs involved in setting up and running those experiments, the lack of heterogeneity amongst the user population and the limited scale of the experiments, which usually involve a relatively restricted set of users. In this paper, we propose an alternative experimental methodology to laboratory-based user studies. Our novel experimental methodology uses a crowdsourcing platform as a means of engaging study participants. Through crowdsourcing, our experimental methodology can capture user interactions and searching behaviours at a lower cost, with more data, and within a shorter period than traditional laboratory-based user studies, and therefore can be used to assess the performances of IIR systems. In this article, we show the characteristic differences of our approach with respect to traditional IIR experimental and evaluation procedures. We also perform a use case study comparing crowdsourcing-based evaluation with laboratory-based evaluation of IIR systems, which can serve as a tutorial for setting up crowdsourcing-based IIR evaluations.
Resumo:
We present a study to understand the effect that negated terms (e.g., "no fever") and family history (e.g., "family history of diabetes") have on searching clinical records. Our analysis is aimed at devising the most effective means of handling negation and family history. In doing so, we explicitly represent a clinical record according to its different content types: negated, family history and normal content; the retrieval model weights each of these separately. Empirical evaluation shows that overall the presence of negation harms retrieval effectiveness while family history has little effect. We show negation is best handled by weighting negated content (rather than the common practise of removing or replacing it). However, we also show that many queries benefit from the inclusion of negated content and that negation is optimally handled on a per-query basis. Additional evaluation shows that adaptive handing of negated and family history content can have significant benefits.
Resumo:
While the Probability Ranking Principle for Information Retrieval provides the basis for formal models, it makes a very strong assumption regarding the dependence between documents. However, it has been observed that in real situations this assumption does not always hold. In this paper we propose a reformulation of the Probability Ranking Principle based on quantum theory. Quantum probability theory naturally includes interference effects between events. We posit that this interference captures the dependency between the judgement of document relevance. The outcome is a more sophisticated principle, the Quantum Probability Ranking Principle, that provides a more sensitive ranking which caters for interference/dependence between documents’ relevance.
Resumo:
This paper estimates the benefit of a plan for information providing system on road administration by WebGIS. The system will reduce travel costs of visitors from their business establishments to a road administration section of a city office. The authors had individual interviews with the visitors at the section of the Ichikawa City Office. Annual total sum of travel costs was estimated at 37 million yen at most. This paper also proposes formulas which expect the frequency of visits or the total sum of travel costs from the spatial distribution of the business establishments without questionnaires.
Resumo:
This thesis presents new methods for classification and thematic grouping of billions of web pages, at scales previously not achievable. This process is also known as document clustering, where similar documents are automatically associated with clusters that represent various distinct topic. These automatically discovered topics are in turn used to improve search engine performance by only searching the topics that are deemed relevant to particular user queries.
Resumo:
Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole - term frequencies are a familiar example. Proportions carry only relative information and are not free to vary independently of one another: for the proportion of one term to increase, one or more others must decrease. These constraints are hallmarks of compositional data. While there has long been discussion in other fields of how such data should be analysed, to our knowledge, Compositional Data Analysis (CoDA) has not been considered in IR. In this work we explore compositional data in IR through the lens of distance measures, and demonstrate that common measures, naïve to compositions, have some undesirable properties which can be avoided with composition-aware measures. As a practical example, these measures are shown to improve clustering. Copyright 2014 ACM.
Resumo:
This thesis targets on a challenging issue that is to enhance users' experience over massive and overloaded web information. The novel pattern-based topic model proposed in this thesis can generate high-quality multi-topic user interest models technically by incorporating statistical topic modelling and pattern mining. We have successfully applied the pattern-based topic model to both fields of information filtering and information retrieval. The success of the proposed model in finding the most relevant information to users mainly comes from its precisely semantic representations to represent documents and also accurate classification of the topics at both document level and collection level.
Resumo:
Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.
Resumo:
The increasing amount of information that is annotated against standardised semantic resources offers opportunities to incorporate sophisticated levels of reasoning, or inference, into the retrieval process. In this position paper, we reflect on the need to incorporate semantic inference into retrieval (in particular for medical information retrieval) as well as previous attempts that have been made so far with mixed success. Medical information retrieval is a fertile ground for testing inference mechanisms to augment retrieval. The medical domain offers a plethora of carefully curated, structured, semantic resources, along with well established entity extraction and linking tools, and search topics that intuitively require a number of different inferential processes (e.g., conceptual similarity, conceptual implication, etc.). We argue that integrating semantic inference in information retrieval has the potential to uncover a large amount of information that otherwise would be inaccessible; but inference is also risky and, if not used cautiously, can harm retrieval.