880 resultados para Information retrieval


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many existing information retrieval models do not explicitly take into account in- formation about word associations. Our approach makes use of rst and second order relationships found in natural language, known as syntagmatic and paradigmatic associ- ations, respectively. This is achieved by using a formal model of word meaning within the query expansion process. On ad hoc retrieval, our approach achieves statistically sig- ni cant improvements in MAP (0.158) and P@20 (0.396) over our baseline model. The ERR@20 and nDCG@20 of our system was 0.249 and 0.192 respectively. Our results and discussion suggest that information about both syntagamtic and paradigmatic associa- tions can assist with improving retrieval eectiveness on ad hoc retrieval.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. One of the most popular web personalization systems is recommender systems. In recommender systems choosing user information that can be used to profile users is very crucial for user profiling. In Web 2.0, one facility that can help users organize Web resources of their interest is user tagging systems. Exploring user tagging behavior provides a promising way for understanding users’ information needs since tags are given directly by users. However, free and relatively uncontrolled vocabulary makes the user self-defined tags lack of standardization and semantic ambiguity. Also, the relationships among tags need to be explored since there are rich relationships among tags which could provide valuable information for us to better understand users. In this paper, we propose a novel approach for learning tag ontology based on the widely used lexical database WordNet for capturing the semantics and the structural relationships of tags. We present personalization strategies to disambiguate the semantics of tags by combining the opinion of WordNet lexicographers and users’ tagging behavior together. To personalize further, clustering of users is performed to generate a more accurate ontology for a particular group of users. In order to evaluate the usefulness of the tag ontology, we use the tag ontology in a pilot tag recommendation experiment for improving the recommendation performance by exploiting the semantic information in the tag ontology. The initial result shows that the personalized information has improved the accuracy of the tag recommendation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many existing information retrieval models do not explicitly take into account in- formation about word associations. Our approach makes use of rst and second order relationships found in natural language, known as syntagmatic and paradigmatic associ- ations, respectively. This is achieved by using a formal model of word meaning within the query expansion process. On ad hoc retrieval, our approach achieves statistically sig- ni cant improvements in MAP (0.158) and P@20 (0.396) over our baseline model. The ERR@20 and nDCG@20 of our system was 0.249 and 0.192 respectively. Our results and discussion suggest that information about both syntagamtic and paradigmatic associa- tions can assist with improving retrieval eectiveness on ad hoc retrieval.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we examine automated Chinese to English link discovery in Wikipedia and the effects of Chinese segmentation and Chinese to English translation on the hyperlink recommendation. Our experimental results show that the implemented link discovery framework can effectively recommend Chinese-to-English cross-lingual links. The techniques described here can assist bi-lingual users where a particular topic is not covered in Chinese, is not equally covered in both languages, or is biased in one language; as well as for language learning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Free association norms indicate that words are organized into semantic/associative neighborhoods within a larger network of words and links that bind the net together. We present evidence indicating that memory for a recent word event can depend on implicitly and simultaneously activating related words in its neighborhood. Processing a word during encoding primes its network representation as a function of the density of the links in its neighborhood. Such priming increases recall and recognition and can have long lasting effects when the word is processed in working memory. Evidence for this phenomenon is reviewed in extralist cuing, primed free association, intralist cuing, and single-item recognition tasks. The findings also show that when a related word is presented to cue the recall of a studied word, the cue activates it in an array of related words that distract and reduce the probability of its selection. The activation of the semantic network produces priming benefits during encoding and search costs during retrieval. In extralist cuing recall is a negative function of cue-to-distracter strength and a positive function of neighborhood density, cue-to-target strength, and target-to cue strength. We show how four measures derived from the network can be combined and used to predict memory performance. These measures play different roles in different tasks indicating that the contribution of the semantic network varies with the context provided by the task. We evaluate spreading activation and quantum-like entanglement explanations for the priming effect produced by neighborhood density.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Information retrieval (IR) by clinicians in the healthcare setting is critical for informing clinical decision-making. However, a large part of this information is in the form of free-text and inhibits clinical decision support and effective healthcare services. This makes meaningful use of clinical free-­text in electronic health records (EHRs) for patient care a difficult task. Within the context of IR, given a repository of free-­text clinical reports, one might want to retrieve and analyse data for patients who have a known clinical finding.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Due to the development of XML and other data models such as OWL and RDF, sharing data is an increasingly common task since these data models allow simple syntactic translation of data between applications. However, in order for data to be shared semantically, there must be a way to ensure that concepts are the same. One approach is to employ commonly usedschemas—called standard schemas —which help guarantee that syntactically identical objects have semantically similar meanings. As a result of the spread of data sharing, there has been widespread adoption of standard schemas in a broad range of disciplines and for a wide variety of applications within a very short period of time. However, standard schemas are still in their infancy and have not yet matured or been thoroughly evaluated. It is imperative that the data management research community takes a closer look at how well these standard schemas have fared in real-world applications to identify not only their advantages, but also the operational challenges that real users face. In this paper, we both examine the usability of standard schemas in a comparison that spans multiple disciplines, and describe our first step at resolving some of these issues in our Semantic Modeling System. We evaluate our Semantic Modeling System through a careful case study of the use of standard schemas in architecture, engineering, and construction, which we conducted with domain experts. We discuss how our Semantic Modeling System can help the broader problem and also discuss a number of challenges that still remain.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Building and maintaining software are not easy tasks. However, thanks to advances in web technologies, a new paradigm is emerging in software development. The Service Oriented Architecture (SOA) is a relatively new approach that helps bridge the gap between business and IT and also helps systems remain exible. However, there are still several challenges with SOA. As the number of available services grows, developers are faced with the problem of discovering the services they need. Public service repositories such as Programmable Web provide only limited search capabilities. Several mechanisms have been proposed to improve web service discovery by using semantics. However, most of these require manually tagging the services with concepts in an ontology. Adding semantic annotations is a non-trivial process that requires a certain skill-set from the annotator and also the availability of domain ontologies that include the concepts related to the topics of the service. These issues have prevented these mechanisms becoming widespread. This thesis focuses on two main problems. First, to avoid the overhead of manually adding semantics to web services, several automatic methods to include semantics in the discovery process are explored. Although experimentation with some of these strategies has been conducted in the past, the results reported in the literature are mixed. Second, Wikipedia is explored as a general-purpose ontology. The benefit of using it as an ontology is assessed by comparing these semantics-based methods to classic term-based information retrieval approaches. The contribution of this research is significant because, to the best of our knowledge, a comprehensive analysis of the impact of using Wikipedia as a source of semantics in web service discovery does not exist. The main output of this research is a web service discovery engine that implements these methods and a comprehensive analysis of the benefits and trade-offs of these semantics-based discovery approaches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We are pleased to present the papers from the Australasian Health Informatics and Knowledge Management (HIKM) conference stream held on 20 January 2011 in Perth as a session of the Australasian Computer Science Week (ASCW) 2011. Formerly HIKM was named Health Data and Knowledge Management, however the inclusion of the health informatics term is timely given the current health reform. The submissions to HIKM 2011 demonstrated that Australasian researchers lead with many research and development innovations coming to fruition. Some of these innovations can be seen here, and we believe further recognition will accomplish by continuation to HIKM in the future. The HIKM conference is a review of health informatics related research, development and education opportunities. The conference papers were written to communicate with other researchers and share research findings, capturing each and every aspect of the health informatics field. They are namely: conceptual models and architectures, privacy and quality of health data, health workflow management patient journey analysis, health information retrieval, analysis and visualisation, data integration/linking, systems for integrated or coordinated care, electronic health records (EHRs) and personally controlled electronic health records (PCEHRs), health data ontologies, and standardisation in health data and clinical applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This report describes the available functionality and use of the ClusterEval evaluation software. It implements novel and standard measures for the evaluation of cluster quality. This software has been used at the INEX XML Mining track and in the MediaEval Social Event Detection task.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The rapid growth of visual information on Web has led to immense interest in multimedia information retrieval (MIR). While advancement in MIR systems has achieved some success in specific domains, particularly the content-based approaches, general Web users still struggle to find the images they want. Despite the success in content-based object recognition or concept extraction, the major problem in current Web image searching remains in the querying process. Since most online users only express their needs in semantic terms or objects, systems that utilize visual features (e.g., color or texture) to search images create a semantic gap which hinders general users from fully expressing their needs. In addition, query-by-example (QBE) retrieval imposes extra obstacles for exploratory search because users may not always have the representative image at hand or in mind when starting a search (i.e. the page zero problem). As a result, the majority of current online image search engines (e.g., Google, Yahoo, and Flickr) still primarily use textual queries to search. The problem with query-based retrieval systems is that they only capture users’ information need in terms of formal queries;; the implicit and abstract parts of users’ information needs are inevitably overlooked. Hence, users often struggle to formulate queries that best represent their needs, and some compromises have to be made. Studies of Web search logs suggest that multimedia searches are more difficult than textual Web searches, and Web image searching is the most difficult compared to video or audio searches. Hence, online users need to put in more effort when searching multimedia contents, especially for image searches. Most interactions in Web image searching occur during query reformulation. While log analysis provides intriguing views on how the majority of users search, their search needs or motivations are ultimately neglected. User studies on image searching have attempted to understand users’ search contexts in terms of users’ background (e.g., knowledge, profession, motivation for search and task types) and the search outcomes (e.g., use of retrieved images, search performance). However, these studies typically focused on particular domains with a selective group of professional users. General users’ Web image searching contexts and behaviors are little understood although they represent the majority of online image searching activities nowadays. We argue that only by understanding Web image users’ contexts can the current Web search engines further improve their usefulness and provide more efficient searches. In order to understand users’ search contexts, a user study was conducted based on university students’ Web image searching in News, Travel, and commercial Product domains. The three search domains were deliberately chosen to reflect image users’ interests in people, time, event, location, and objects. We investigated participants’ Web image searching behavior, with the focus on query reformulation and search strategies. Participants’ search contexts such as their search background, motivation for search, and search outcomes were gathered by questionnaires. The searching activity was recorded with participants’ think aloud data for analyzing significant search patterns. The relationships between participants’ search contexts and corresponding search strategies were discovered by Grounded Theory approach. Our key findings include the following aspects: - Effects of users' interactive intents on query reformulation patterns and search strategies - Effects of task domain on task specificity and task difficulty, as well as on some specific searching behaviors - Effects of searching experience on result expansion strategies A contextual image searching model was constructed based on these findings. The model helped us understand Web image searching from user perspective, and introduced a context-aware searching paradigm for current retrieval systems. A query recommendation tool was also developed to demonstrate how users’ query reformulation contexts can potentially contribute to more efficient searching.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A user’s query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques ignore information about the dependencies that exist between words in natural language. However, more recent approaches have demonstrated that by explicitly modeling associations between terms significant improvements in retrieval effectiveness can be achieved over those that ignore these dependencies. State-of-the-art dependency-based approaches have been shown to primarily model syntagmatic associations. Syntagmatic associations infer a likelihood that two terms co-occur more often than by chance. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.