321 resultados para Information retrieval interfaces
Resumo:
It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well.
Resumo:
As a model for knowledge description and formalization, ontologies are widely used to represent user profiles in personalized web information gathering. However, when representing user profiles, many models have utilized only knowledge from either a global knowledge base or a user local information. In this paper, a personalized ontology model is proposed for knowledge representation and reasoning over user profiles. This model learns ontological user profiles from both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against benchmark models in web information gathering. The results show that this ontology model is successful.
Resumo:
Mobile phones are now powerful and pervasive making them ideal information browsers. The Internet has revolutionized our lives and is a major knowledge sharing media. However, many mobile phone users cannot access the Internet (for financial or technical reasons) and so the mobile Internet has not been fully realized. We propose a novel content delivery network based on both a factual and speculative analysis of today’s technology and analyze its feasibility. If adopted people living in remote regions without Internet will be able to access essential (static) information with periodic updates.
Resumo:
Purpose – The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users. Design/methodology/approach – The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration. Findings – The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified. Research limitations/implications – Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age-related sites is a direction that is currently being exploring. Practical implications – This research is background work that can be incorporated in search engines or other web-based applications, to help marketing companies and advertisers. Originality/value – This research enhances the current state of knowledge in short-text classification and query log learning. Classification schemes, Computer networks, Information retrieval, Man-machine systems, User interfaces
Resumo:
Collaborative question answering (cQA) portals such as Yahoo! Answers allow users as askers or answer authors to communicate, and exchange information through the asking and answering of questions in the network. In their current set-up, answers to a question are arranged in chronological order. For effective information retrieval, it will be advantageous to have the users’ answers ranked according to their quality. This paper proposes a novel approach of evaluating and ranking the users’answers and recommending the top-n quality answers to information seekers. The proposed approach is based on a user-reputation method which assigns a score to an answer reflecting its answer author’s reputation level in the network. The proposed approach is evaluated on a dataset collected from a live cQA, namely, Yahoo! Answers. To compare the results obtained by the non-content-based user-reputation method, experiments were also conducted with several content-based methods that assign a score to an answer reflecting its content quality. Various combinations of non-content and content-based scores were also used in comparing results. Empirical analysis shows that the proposed method is able to rank the users’ answers and recommend the top-n answers with good accuracy. Results of the proposed method outperform the content-based methods, various combinations, and the results obtained by the popular link analysis method, HITS.
Resumo:
Purpose - The purpose of this paper is to examine post-graduate health promotion students’ self-perceptions of information literacy skills prior to, and after completing PILOT, an online information literacy tutorial. Design/methodology/approach – Post graduate students at Queensland University of Technology enrolled in PUP038 New Developments in Health Promotion completed a pre- and post- self-assessment questionnaire. From 2008-2011 students were required to rate their academic writing and research skills before and after completing the PILOT online information literacy tutorial. Quantitative trends and qualitative themes were analysed to establish students’ self-assessment and the effectiveness of the PILOT tutorial. Findings – The results from four years of post-graduate students’ self-assessment questionnaires provide evidence of perceived improvements in information literacy skills after completing PILOT. Some students continued to have trouble with locating quality information and analysis as well as issues surrounding referencing and plagiarism. Feedback was generally positive and students’ responses indicated they found the tutorial highly beneficial in improving their research skills. Originality/value - This paper is original because it describes post-graduate health promotion students’ self-assessment of information literacy skills over a period of four years. The literature is limited in the health promotion domain and self-assessment of post-graduate students’ information literacy skills. Keywords – Self-assessment, Post-graduate, Information literacy, Library instruction, Higher education, Health promotion, Evidence-based practice Paper Type - Research paper
Resumo:
This paper addresses the issue of analogical inference, and its potential role as the mediator of new therapeutic discoveries, by using disjunction operators based on quantum connectives to combine many potential reasoning pathways into a single search expression. In it, we extend our previous work in which we developed an approach to analogical retrieval using the Predication-based Semantic Indexing (PSI) model, which encodes both concepts and the relationships between them in high-dimensional vector space. As in our previous work, we leverage the ability of PSI to infer predicate pathways connecting two example concepts, in this case comprising of known therapeutic relationships. For example, given that drug x TREATS disease z, we might infer the predicate pathway drug x INTERACTS WITH gene y ASSOCIATED WITH disease z, and use this pathway to search for drugs related to another disease in similar ways. As biological systems tend to be characterized by networks of relationships, we evaluate the ability of quantum-inspired operators to mediate inference and retrieval across multiple relations, by testing the ability of different approaches to recover known therapeutic relationships. In addition, we introduce a novel complex vector based implementation of PSI, based on Plate’s Circular Holographic Reduced Representations, which we utilize for all experiments in addition to the binary vector based approach we have applied in our previous research.
Resumo:
This paper outlines a novel approach for modelling semantic relationships within medical documents. Medical terminologies contain a rich source of semantic information critical to a number of techniques in medical informatics, including medical information retrieval. Recent research suggests that corpus-driven approaches are effective at automatically capturing semantic similarities between medical concepts, thus making them an attractive option for accessing semantic information. Most previous corpus-driven methods only considered syntagmatic associations. In this paper, we adapt a recent approach that explicitly models both syntagmatic and paradigmatic associations. We show that the implicit similarity between certain medical concepts can only be modelled using paradigmatic associations. In addition, the inclusion of both types of associations overcomes the sensitivity to the training corpus experienced by previous approaches, making our method both more effective and more robust. This finding may have implications for researchers in the area of medical information retrieval.
Resumo:
The cross-sections of the Social Web and the Semantic Web has put folksonomy in the spot light for its potential in overcoming knowledge acquisition bottleneck and providing insight for "wisdom of the crowds". Folksonomy which comes as the results of collaborative tagging activities has provided insight into user's understanding about Web resources which might be useful for searching and organizing purposes. However, collaborative tagging vocabulary poses some challenges since tags are freely chosen by users and may exhibit synonymy and polysemy problem. In order to overcome these challenges and boost the potential of folksonomy as emergence semantics we propose to consolidate the diverse vocabulary into a consolidated entities and concepts. We propose to extract a tag ontology by ontology learning process to represent the semantics of a tagging community. This paper presents a novel approach to learn the ontology based on the widely used lexical database WordNet. We present personalization strategies to disambiguate the semantics of tags by combining the opinion of WordNet lexicographers and users’ tagging behavior together. We provide empirical evaluations by using the semantic information contained in the ontology in a tag recommendation experiment. The results show that by using the semantic relationships on the ontology the accuracy of the tag recommender has been improved.
Resumo:
Retrieving information from Twitter is always challenging due to its large volume, inconsistent writing and noise. Most existing information retrieval (IR) and text mining methods focus on term-based approach, but suffers from the problems of terms variation such as polysemy and synonymy. This problem deteriorates when such methods are applied on Twitter due to the length limit. Over the years, people have held the hypothesis that pattern-based methods should perform better than term-based methods as it provides more context, but limited studies have been conducted to support such hypothesis especially in Twitter. This paper presents an innovative framework to address the issue of performing IR in microblog. The proposed framework discover patterns in tweets as higher level feature to assign weight for low-level features (i.e. terms) based on their distributions in higher level features. We present the experiment results based on TREC11 microblog dataset and shows that our proposed approach significantly outperforms term-based methods Okapi BM25, TF-IDF and pattern based methods, using precision, recall and F measures.
Resumo:
With the explosive growth of resources available through the Internet, information mismatching and overload have become a severe concern to users. Web users are commonly overwhelmed by huge volume of information and are faced with the challenge of finding the most relevant and reliable information in a timely manner. Personalised information gathering and recommender systems represent state-of-the-art tools for efficient selection of the most relevant and reliable information resources, and the interest in such systems has increased dramatically over the last few years. However, web personalization has not yet been well-exploited; difficulties arise while selecting resources through recommender systems from a technological and social perspective. Aiming to promote high quality research in order to overcome these challenges, this paper provides a comprehensive survey on the recent work and achievements in the areas of personalised web information gathering and recommender systems. The report covers concept-based techniques exploited in personalised information gathering and recommender systems.
Resumo:
The article focuses on how the information seeker makes decisions about relevance. It will employ a novel decision theory based on quantum probabilities. This direction derives from mounting research within the field of cognitive science showing that decision theory based on quantum probabilities is superior to modelling human judgements than standard probability models [2, 1]. By quantum probabilities, we mean decision event space is modelled as vector space rather than the usual Boolean algebra of sets. In this way,incompatible perspectives around a decision can be modelled leading to an interference term which modifies the law of total probability. The interference term is crucial in modifying the probability judgements made by current probabilistic systems so they align better with human judgement. The goal of this article is thus to model the information seeker user as a decision maker. For this purpose, signal detection models will be sketched which are in principle applicable in a wide variety of information seeking scenarios.
Resumo:
In response to current developments In the tertiary education sector, the Queensland University of Technology Library has mounted an Intensive course - Advanced Information Retrieval Skills - for higher degree students. In determining need for such a course, a survey of postgraduate students and their supervisors was conducted. Results of this survey are discussed and details of the four credit point subjects are outlined.