914 resultados para pacs: information retrieval techniques


Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the growing number of XML documents on theWeb it becomes essential to effectively organise these XML documents in order to retrieve useful information from them. A possible solution is to apply clustering on the XML documents to discover knowledge that promotes effective data management, information retrieval and query processing. However, many issues arise in discovering knowledge from these types of semi-structured documents due to their heterogeneity and structural irregularity. Most of the existing research on clustering techniques focuses only on one feature of the XML documents, this being either their structure or their content due to scalability and complexity problems. The knowledge gained in the form of clusters based on the structure or the content is not suitable for reallife datasets. It therefore becomes essential to include both the structure and content of XML documents in order to improve the accuracy and meaning of the clustering solution. However, the inclusion of both these kinds of information in the clustering process results in a huge overhead for the underlying clustering algorithm because of the high dimensionality of the data. The overall objective of this thesis is to address these issues by: (1) proposing methods to utilise frequent pattern mining techniques to reduce the dimension; (2) developing models to effectively combine the structure and content of XML documents; and (3) utilising the proposed models in clustering. This research first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. A clustering framework with two types of models, implicit and explicit, is developed. The implicit model uses a Vector Space Model (VSM) to combine the structure and the content information. The explicit model uses a higher order model, namely a 3- order Tensor Space Model (TSM), to explicitly combine the structure and the content information. This thesis also proposes a novel incremental technique to decompose largesized tensor models to utilise the decomposed solution for clustering the XML documents. The proposed framework and its components were extensively evaluated on several real-life datasets exhibiting extreme characteristics to understand the usefulness of the proposed framework in real-life situations. Additionally, this research evaluates the outcome of the clustering process on the collection selection problem in the information retrieval on the Wikipedia dataset. The experimental results demonstrate that the proposed frequent pattern mining and clustering methods outperform the related state-of-the-art approaches. In particular, the proposed framework of utilising frequent structures for constraining the content shows an improvement in accuracy over content-only and structure-only clustering results. The scalability evaluation experiments conducted on large scaled datasets clearly show the strengths of the proposed methods over state-of-the-art methods. In particular, this thesis work contributes to effectively combining the structure and the content of XML documents for clustering, in order to improve the accuracy of the clustering solution. In addition, it also contributes by addressing the research gaps in frequent pattern mining to generate efficient and concise frequent subtrees with various node relationships that could be used in clustering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last 10 years, the third sector has seen an eruption of texts, websites, discussion forums, conferences, new journals, new research centres and sector-specific degrees. This growing abundance of information allows for hitherto impossible networking, collaboration and general awareness of what is happening in the sector. At the same time, however, like staff in many industries, nonprofit professionals can suffer from an increasingly common 21st century malaise known as ‘information anxiety’. It is worth examining the sector through the lens of Information Studies theory, to question what the information technology needs of nonprofits are and how their information management techniques may differ from those in the public and private sectors. There are implications of this both for those within the industry (in terms of governance, training and public relations) and those external to it (who may form relationships with nonprofits on the basis of access to information).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose - The purpose of this paper is to examine post-graduate health promotion students’ self-perceptions of information literacy skills prior to, and after completing PILOT, an online information literacy tutorial. Design/methodology/approach – Post graduate students at Queensland University of Technology enrolled in PUP038 New Developments in Health Promotion completed a pre- and post- self-assessment questionnaire. From 2008-2011 students were required to rate their academic writing and research skills before and after completing the PILOT online information literacy tutorial. Quantitative trends and qualitative themes were analysed to establish students’ self-assessment and the effectiveness of the PILOT tutorial. Findings – The results from four years of post-graduate students’ self-assessment questionnaires provide evidence of perceived improvements in information literacy skills after completing PILOT. Some students continued to have trouble with locating quality information and analysis as well as issues surrounding referencing and plagiarism. Feedback was generally positive and students’ responses indicated they found the tutorial highly beneficial in improving their research skills. Originality/value - This paper is original because it describes post-graduate health promotion students’ self-assessment of information literacy skills over a period of four years. The literature is limited in the health promotion domain and self-assessment of post-graduate students’ information literacy skills. Keywords – Self-assessment, Post-graduate, Information literacy, Library instruction, Higher education, Health promotion, Evidence-based practice Paper Type - Research paper

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the issue of analogical inference, and its potential role as the mediator of new therapeutic discoveries, by using disjunction operators based on quantum connectives to combine many potential reasoning pathways into a single search expression. In it, we extend our previous work in which we developed an approach to analogical retrieval using the Predication-based Semantic Indexing (PSI) model, which encodes both concepts and the relationships between them in high-dimensional vector space. As in our previous work, we leverage the ability of PSI to infer predicate pathways connecting two example concepts, in this case comprising of known therapeutic relationships. For example, given that drug x TREATS disease z, we might infer the predicate pathway drug x INTERACTS WITH gene y ASSOCIATED WITH disease z, and use this pathway to search for drugs related to another disease in similar ways. As biological systems tend to be characterized by networks of relationships, we evaluate the ability of quantum-inspired operators to mediate inference and retrieval across multiple relations, by testing the ability of different approaches to recover known therapeutic relationships. In addition, we introduce a novel complex vector based implementation of PSI, based on Plate’s Circular Holographic Reduced Representations, which we utilize for all experiments in addition to the binary vector based approach we have applied in our previous research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The cross-sections of the Social Web and the Semantic Web has put folksonomy in the spot light for its potential in overcoming knowledge acquisition bottleneck and providing insight for "wisdom of the crowds". Folksonomy which comes as the results of collaborative tagging activities has provided insight into user's understanding about Web resources which might be useful for searching and organizing purposes. However, collaborative tagging vocabulary poses some challenges since tags are freely chosen by users and may exhibit synonymy and polysemy problem. In order to overcome these challenges and boost the potential of folksonomy as emergence semantics we propose to consolidate the diverse vocabulary into a consolidated entities and concepts. We propose to extract a tag ontology by ontology learning process to represent the semantics of a tagging community. This paper presents a novel approach to learn the ontology based on the widely used lexical database WordNet. We present personalization strategies to disambiguate the semantics of tags by combining the opinion of WordNet lexicographers and users’ tagging behavior together. We provide empirical evaluations by using the semantic information contained in the ontology in a tag recommendation experiment. The results show that by using the semantic relationships on the ontology the accuracy of the tag recommender has been improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Retrieving information from Twitter is always challenging due to its large volume, inconsistent writing and noise. Most existing information retrieval (IR) and text mining methods focus on term-based approach, but suffers from the problems of terms variation such as polysemy and synonymy. This problem deteriorates when such methods are applied on Twitter due to the length limit. Over the years, people have held the hypothesis that pattern-based methods should perform better than term-based methods as it provides more context, but limited studies have been conducted to support such hypothesis especially in Twitter. This paper presents an innovative framework to address the issue of performing IR in microblog. The proposed framework discover patterns in tweets as higher level feature to assign weight for low-level features (i.e. terms) based on their distributions in higher level features. We present the experiment results based on TREC11 microblog dataset and shows that our proposed approach significantly outperforms term-based methods Okapi BM25, TF-IDF and pattern based methods, using precision, recall and F measures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we examine automated Chinese to English link discovery in Wikipedia and the effects of Chinese segmentation and Chinese to English translation on the hyperlink recommendation. Our experimental results show that the implemented link discovery framework can effectively recommend Chinese-to-English cross-lingual links. The techniques described here can assist bi-lingual users where a particular topic is not covered in Chinese, is not equally covered in both languages, or is biased in one language; as well as for language learning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A user’s query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques ignore information about the dependencies that exist between words in natural language. However, more recent approaches have demonstrated that by explicitly modeling associations between terms significant improvements in retrieval effectiveness can be achieved over those that ignore these dependencies. State-of-the-art dependency-based approaches have been shown to primarily model syntagmatic associations. Syntagmatic associations infer a likelihood that two terms co-occur more often than by chance. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cross-Lingual Link Discovery (CLLD) is a new problem in Information Retrieval. The aim is to automatically identify meaningful and relevant hypertext links between documents in different languages. This is particularly helpful in knowledge discovery if a multi-lingual knowledge base is sparse in one language or another, or the topical coverage in each language is different; such is the case with Wikipedia. Techniques for identifying new and topically relevant cross-lingual links are a current topic of interest at NTCIR where the CrossLink task has been running since the 2011 NTCIR-9. This paper presents the evaluation framework for benchmarking algorithms for cross-lingual link discovery evaluated in the context of NTCIR-9. This framework includes topics, document collections, assessments, metrics, and a toolkit for pooling, assessment, and evaluation. The assessments are further divided into two separate sets: manual assessments performed by human assessors; and automatic assessments based on links extracted from Wikipedia itself. Using this framework we show that manual assessment is more robust than automatic assessment in the context of cross-lingual link discovery.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The article focuses on how the information seeker makes decisions about relevance. It will employ a novel decision theory based on quantum probabilities. This direction derives from mounting research within the field of cognitive science showing that decision theory based on quantum probabilities is superior to modelling human judgements than standard probability models [2, 1]. By quantum probabilities, we mean decision event space is modelled as vector space rather than the usual Boolean algebra of sets. In this way,incompatible perspectives around a decision can be modelled leading to an interference term which modifies the law of total probability. The interference term is crucial in modifying the probability judgements made by current probabilistic systems so they align better with human judgement. The goal of this article is thus to model the information seeker user as a decision maker. For this purpose, signal detection models will be sketched which are in principle applicable in a wide variety of information seeking scenarios.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In response to current developments In the tertiary education sector, the Queensland University of Technology Library has mounted an Intensive course - Advanced Information Retrieval Skills - for higher degree students. In determining need for such a course, a survey of postgraduate students and their supervisors was conducted. Results of this survey are discussed and details of the four credit point subjects are outlined.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper details the participation of the Australian e- Health Research Centre (AEHRC) in the ShARe/CLEF 2013 eHealth Evaluation Lab { Task 3. This task aims to evaluate the use of information retrieval (IR) systems to aid consumers (e.g. patients and their relatives) in seeking health advice on the Web. Our submissions to the ShARe/CLEF challenge are based on language models generated from the web corpus provided by the organisers. Our baseline system is a standard Dirichlet smoothed language model. We enhance the baseline by identifying and correcting spelling mistakes in queries, as well as expanding acronyms using AEHRC's Medtex medical text analysis platform. We then consider the readability and the authoritativeness of web pages to further enhance the quality of the document ranking. Measures of readability are integrated in the language models used for retrieval via prior probabilities. Prior probabilities are also used to encode authoritativeness information derived from a list of top-100 consumer health websites. Empirical results show that correcting spelling mistakes and expanding acronyms found in queries signi cantly improves the e ectiveness of the language model baseline. Readability priors seem to increase retrieval e ectiveness for graded relevance at early ranks (nDCG@5, but not precision), but no improvements are found at later ranks and when considering binary relevance. The authoritativeness prior does not appear to provide retrieval gains over the baseline: this is likely to be because of the small overlap between websites in the corpus and those in the top-100 consumer-health websites we acquired.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Entity-oriented retrieval aims to return a list of relevant entities rather than documents to provide exact answers for user queries. The nature of entity-oriented retrieval requires identifying the semantic intent of user queries, i.e., understanding the semantic role of query terms and determining the semantic categories which indicate the class of target entities. Existing methods are not able to exploit the semantic intent by capturing the semantic relationship between terms in a query and in a document that contains entity related information. To improve the understanding of the semantic intent of user queries, we propose concept-based retrieval method that not only automatically identifies the semantic intent of user queries, i.e., Intent Type and Intent Modifier but introduces concepts represented by Wikipedia articles to user queries. We evaluate our proposed method on entity profile documents annotated by concepts from Wikipedia category and list structure. Empirical analysis reveals that the proposed method outperforms several state-of-the-art approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the rapid growth of information on the Web, the study of information searching has let to an increased interest. Information behaviour (IB) researchers and information systems (IS) developers are continuously exploring user - Web search interactions to understand and to help users to provide assistance with their information searching. In attempting to develop models of IB, several studies have identified various factors that govern user's information searching and information retrieval (IR), such as age, gender, prior knowledge and task complexity. However, how users' contextual factors, such as cognitive styles, affect Web search interactions has not been clearly explained by the current models of Web Searching and IR. This study explores the influence of users' cognitive styles on their Web search behaviour. The main goal of the study is to enhance Web search models with a better understanding of how these cognitive styles affect Web searching. Modelling Web search behaviour with a greater understanding of user's cognitive styles can help information science researchers and IS designers to bridge the semantic gap between the user and the IS. To achieve the aims of the study, a user study with 50 participants was conducted. The study adopted a mixed method approach incorporating several data collection strategies to gather a range of qualitative and quantitative data. The study utilised pre-search and post-search questionnaires to collect the participants' demographic information and their level of satisfaction about the search interactions. Riding's (1991) Cognitive Style Analysis (CSA) test was used to assess the participants' cognitive styles. Participants completed three predesigned search tasks and the whole user - web search interactions, including thinkaloud, were captured using a monitoring program. Data analysis involved several qualitative and quantitative techniques: the quantitative data gave raise to detailed findings about users' Web searching and cognitive styles, the qualitative data enriched the findings with illustrative examples. The study results provide valuable insights into Web searching behaviour among different cognitive style users. The findings of the study extend our understanding of Web search behaviour and how users search information on the Web. Three key study findings emerged: • Users' Web search behaviour was demonstrated through information searching strategies, Web navigation styles, query reformulation behaviour and information processing approaches while performing Web searches. The manner in which these Web search patterns were demonstrated varied among the users with different cognitive style groups. • Users' cognitive styles influenced their information searching strategies, query reformulation behaviour, Web navigational styles and information processing approaches. Users with particular cognitive styles followed certain Web search patterns. • Fundamental relationships were evident between users' cognitive styles and their Web search behaviours; and these relationships can be illustrated through modelling Web search behaviour. Two models that depict the associations between Web search interactions, user characteristics and users' cognitive styles were developed. These models provide a greater understanding of Web search behaviour from the user perspective, particularly how users' cognitive styles influence their Web search behaviour. The significance of this research is twofold: it will provide insights for information science researchers, information system designers, academics, educators, trainers and librarians who want to better understand how users with different cognitive styles perform information searching on the Web; at the same time, it will provide assistance and support to the users. The major outcomes of this study are 1) a comprehensive analysis of how users search the Web; 2) extensive discussion on the implications of the models developed in this study for future work; and 3) a theoretical framework to bridge high-level search models and cognitive models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The continuous growth of the XML data poses a great concern in the area of XML data management. The need for processing large amounts of XML data brings complications to many applications, such as information retrieval, data integration and many others. One way of simplifying this problem is to break the massive amount of data into smaller groups by application of clustering techniques. However, XML clustering is an intricate task that may involve the processing of both the structure and the content of XML data in order to identify similar XML data. This research presents four clustering methods, two methods utilizing the structure of XML documents and the other two utilizing both the structure and the content. The two structural clustering methods have different data models. One is based on a path model and other is based on a tree model. These methods employ rigid similarity measures which aim to identifying corresponding elements between documents with different or similar underlying structure. The two clustering methods that utilize both the structural and content information vary in terms of how the structure and content similarity are combined. One clustering method calculates the document similarity by using a linear weighting combination strategy of structure and content similarities. The content similarity in this clustering method is based on a semantic kernel. The other method calculates the distance between documents by a non-linear combination of the structure and content of XML documents using a semantic kernel. Empirical analysis shows that the structure-only clustering method based on the tree model is more scalable than the structure-only clustering method based on the path model as the tree similarity measure for the tree model does not need to visit the parents of an element many times. Experimental results also show that the clustering methods perform better with the inclusion of the content information on most test document collections. To further the research, the structural clustering method based on tree model is extended and employed in XML transformation. The results from the experiments show that the proposed transformation process is faster than the traditional transformation system that translates and converts the source XML documents sequentially. Also, the schema matching process of XML transformation produces a better matching result in a shorter time.