22 resultados para Wikipedia, information retrieval, focused link discovery
Resumo:
Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with “false correlation”. In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a twophase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problemresulted from the sparse term-paragraphmatrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerancerough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.
Resumo:
The practice of evidence-based medicine involves consulting documents from repositories such as Scopus, PubMed, or the Cochrane Library. The most common approach for presenting retrieved documents is in the form of a list, with the assumption that the higher a document is on a list, the more relevant it is. Despite this list-based presentation, it is seldom studied how physicians perceive the importance of the order of documents presented in a list. This paper describes an empirical study that elicited and modeled physicians' preferences with regard to list-based results. Preferences were analyzed using a GRIP method that relies on pairwise comparisons of selected subsets of possible rank-ordered lists composed of 3 documents. The results allow us to draw conclusions regarding physicians' attitudes towards the importance of having documents ranked correctly on a result list, versus the importance of retrieving relevant but misplaced documents. Our findings should help developers of clinical information retrieval applications when deciding how retrieved documents should be presented and how performance of the application should be assessed. © 2012 Springer-Verlag Berlin Heidelberg.
Resumo:
Term dependence is a natural consequence of language use. Its successful representation has been a long standing goal for Information Retrieval research. We present a methodology for the construction of a concept hierarchy that takes into account the three basic dimensions of term dependence. We also introduce a document evaluation function that allows the use of the concept hierarchy as a user profile for Information Filtering. Initial experimental results indicate that this is a promising approach for incorporating term dependence in the way documents are filtered.
Resumo:
Timeline generation is an important research task which can help users to have a quick understanding of the overall evolution of any given topic. It thus attracts much attention from research communities in recent years. Nevertheless, existing work on timeline generation often ignores an important factor, the attention attracted to topics of interest (hereafter termed "social attention"). Without taking into consideration social attention, the generated timelines may not reflect users' collective interests. In this paper, we study how to incorporate social attention in the generation of timeline summaries. In particular, for a given topic, we capture social attention by learning users' collective interests in the form of word distributions from Twitter, which are subsequently incorporated into a unified framework for timeline summary generation. We construct four evaluation sets over six diverse topics. We demonstrate that our proposed approach is able to generate both informative and interesting timelines. Our work sheds light on the feasibility of incorporating social attention into traditional text mining tasks. Copyright © 2013 ACM.
Resumo:
Two studies aiming to identify the nature and extent of problems that people have when completing theory of planned behaviour (TPB) questionnaires, using a cognitive interviewing approach are reported. Both studies required participants to 'think aloud' as they completed TPB questionnaires about: (a) increasing physical activity (six general public participants); and (b) binge drinking (13 students). Most people had no identifiable problems with the majority of questions. However, there were problems common to both studies, relating to information retrieval and to participants answering different questions from those intended by researchers. Questions about normative influence were particularly problematic. The standard procedure for developing TPB questionnaires may systematically produce problematic questions. Suggestions are made for improving this procedure. Copyright © 2007 SAGE Publications.
Resumo:
In recent years, learning word vector representations has attracted much interest in Natural Language Processing. Word representations or embeddings learned using unsupervised methods help addressing the problem of traditional bag-of-word approaches which fail to capture contextual semantics. In this paper we go beyond the vector representations at the word level and propose a novel framework that learns higher-level feature representations of n-grams, phrases and sentences using a deep neural network built from stacked Convolutional Restricted Boltzmann Machines (CRBMs). These representations have been shown to map syntactically and semantically related n-grams to closeby locations in the hidden feature space. We have experimented to additionally incorporate these higher-level features into supervised classifier training for two sentiment analysis tasks: subjectivity classification and sentiment classification. Our results have demonstrated the success of our proposed framework with 4% improvement in accuracy observed for subjectivity classification and improved the results achieved for sentiment classification over models trained without our higher level features.
Resumo:
This paper presents the design and results of a task-based user study, based on Information Foraging Theory, on a novel user interaction framework - uInteract - for content-based image retrieval (CBIR). The framework includes a four-factor user interaction model and an interactive interface. The user study involves three focused evaluations, 12 simulated real life search tasks with different complexity levels, 12 comparative systems and 50 subjects. Information Foraging Theory is applied to the user study design and the quantitative data analysis. The systematic findings have not only shown how effective and easy to use the uInteract framework is, but also illustrate the value of Information Foraging Theory for interpreting user interaction with CBIR. © 2011 Springer-Verlag Berlin Heidelberg.