328 resultados para Information Retrieval, Document Databases, Digital Libraries


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the approach taken to the clustering task at INEX 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX 2009 Wikipedia collection. The RI K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Consider a person searching electronic health records, a search for the term ‘cracked skull’ should return documents that contain the term ‘cranium fracture’. A information retrieval systems is required that matches concepts, not just keywords. Further more, determining relevance of a query to a document requires inference – its not simply matching concepts. For example a document containing ‘dialysis machine’ should align with a query for ‘kidney disease’. Collectively we describe this problem as the ‘semantic gap’ – the difference between the raw medical data and the way a human interprets it. This paper presents an approach to semantic search of health records by combining two previous approaches: an ontological approach using the SNOMED CT medical ontology; and a distributional approach using semantic space vector space models. Our approach will be applied to a specific problem in health informatics: the matching of electronic patient records to clinical trials.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

User-Web interactions have emerged as an important area of research in the field of information science. In this study, we investigate the effects of users’ cognitive styles on their Web navigational styles and information processing strategies. We report results from the analyses of 594 minutes recorded Web search sessions of 18 participants engaged in 54 scenario-based search tasks. We use questionnaires, cognitive style test, Web session logs and think-aloud as the data collection instruments. We classify users’ cognitive styles as verbalisers and imagers based on Riding’s (1991) Cognitive Style Analysis test. Two classifications of navigational styles and three categories of information processing strategies are identified. Our study findings show that there exist relationships between users’ cognitive style, and their navigational styles and information processing strategies. Verbal users seem to display sporadic navigational styles, and adopt a scanning strategy to understand the content of the search result page, while imagery users follow a structured navigational style and reading approach. We develop a matrix and a model that depicts the relationships between users’ cognitive styles, and their navigational style and information processing strategies. We discuss how the findings from this study could help search engine designers to provide an adaptive navigation support to users.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

INTRODUCTION: Since the introduction of its QUT ePrints institutional repository of published research outputs, together with the world’s first mandate for author contributions to an institutional repository, Queensland University of Technology (QUT) has been a leader in support of green road open access. With QUT ePrints providing our mechanism for supporting the green road to open access, QUT has since then also continued to expand its secondary open access strategy supporting gold road open access, which is also designed to assist QUT researchers to maximise the accessibility and so impact of their research. ---------- METHODS: QUT Library has adopted the position of selectively supporting true gold road open access publishing by using the Library Resource Allocation budget to pay the author publication fees for QUT authors wishing to publish in the open access journals of a range of publishers including BioMed Central, Public Library of Science and Hindawi. QUT Library has been careful to support only true open access publishers and not those open access publishers with hybrid models which “double dip” by charging authors publication fees and libraries subscription fees for the same journal content. QUT Library has maintained a watch on the growing number of open access journals available from gold road open access publishers and their increased rate of success as measured by publication impact. ---------- RESULTS: This paper reports on the successes and challenges of QUT’s efforts to support true gold road open access publishers and promote these publishing strategy options to researchers at QUT. The number and spread of QUT papers submitted and published in the journals of each publisher is provided. Citation counts for papers and authors are also presented and analysed, with the intention of identifying the benefits to accessibility and research impact for early career and established researchers.---------- CONCLUSIONS: QUT Library is eager to continue and further develop support for this publishing strategy, and makes a number of recommendations to other research institutions, on how they can best achieve success with this strategy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition. INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As a model for knowledge description and formalization, ontologies are widely used to represent user profiles in personalized web information gathering. However, when representing user profiles, many models have utilized only knowledge from either a global knowledge base or a user local information. In this paper, a personalized ontology model is proposed for knowledge representation and reasoning over user profiles. This model learns ontological user profiles from both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against benchmark models in web information gathering. The results show that this ontology model is successful.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Since manually constructing domain-specific sentiment lexicons is extremely time consuming and it may not even be feasible for domains where linguistic expertise is not available. Research on the automatic construction of domain-specific sentiment lexicons has become a hot topic in recent years. The main contribution of this paper is the illustration of a novel semi-supervised learning method which exploits both term-to-term and document-to-term relations hidden in a corpus for the construction of domain specific sentiment lexicons. More specifically, the proposed two-pass pseudo labeling method combines shallow linguistic parsing and corpusbase statistical learning to make domain-specific sentiment extraction scalable with respect to the sheer volume of opinionated documents archived on the Internet these days. Another novelty of the proposed method is that it can utilize the readily available user-contributed labels of opinionated documents (e.g., the user ratings of product reviews) to bootstrap the performance of sentiment lexicon construction. Our experiments show that the proposed method can generate high quality domain-specific sentiment lexicons as directly assessed by human experts. Moreover, the system generated domain-specific sentiment lexicons can improve polarity prediction tasks at the document level by 2:18% when compared to other well-known baseline methods. Our research opens the door to the development of practical and scalable methods for domain-specific sentiment analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mobile phones are now powerful and pervasive making them ideal information browsers. The Internet has revolutionized our lives and is a major knowledge sharing media. However, many mobile phone users cannot access the Internet (for financial or technical reasons) and so the mobile Internet has not been fully realized. We propose a novel content delivery network based on both a factual and speculative analysis of today’s technology and analyze its feasibility. If adopted people living in remote regions without Internet will be able to access essential (static) information with periodic updates.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In information retrieval, a user's query is often not a complete representation of their real information need. The user's information need is a cognitive construction, however the use of cognitive models to perform query expansion have had little study. In this paper, we present a cognitively motivated query expansion technique that uses semantic features for use in ad hoc retrieval. This model is evaluated against a state-of-the-art query expansion technique. The results show our approach provides significant improvements in retrieval effectiveness for the TREC data sets tested.