914 resultados para pacs: information retrieval techniques


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In information retrieval, a user's query is often not a complete representation of their real information need. The user's information need is a cognitive construction, however the use of cognitive models to perform query expansion have had little study. In this paper, we present a cognitively motivated query expansion technique that uses semantic features for use in ad hoc retrieval. This model is evaluated against a state-of-the-art query expansion technique. The results show our approach provides significant improvements in retrieval effectiveness for the TREC data sets tested.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two decades after its inception, Latent Semantic Analysis(LSA) has become part and parcel of every modern introduction to Information Retrieval. For any tool that matures so quickly, it is important to check its lore and limitations, or else stagnation will set in. We focus here on the three main aspects of LSA that are well accepted, and the gist of which can be summarized as follows: (1) that LSA recovers latent semantic factors underlying the document space, (2) that such can be accomplished through lossy compression of the document space by eliminating lexical noise, and (3) that the latter can best be achieved by Singular Value Decomposition. For each aspect we performed experiments analogous to those reported in the LSA literature and compared the evidence brought to bear in each case. On the negative side, we show that the above claims about LSA are much more limited than commonly believed. Even a simple example may show that LSA does not recover the optimal semantic factors as intended in the pedagogical example used in many LSA publications. Additionally, and remarkably deviating from LSA lore, LSA does not scale up well: the larger the document space, the more unlikely that LSA recovers an optimal set of semantic factors. On the positive side, we describe new algorithms to replace LSA (and more recent alternatives as pLSA, LDA, and kernel methods) by trading its l2 space for an l1 space, thereby guaranteeing an optimal set of semantic factors. These algorithms seem to salvage the spirit of LSA as we think it was initially conceived.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intuitively, any ‘bag of words’ approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document’s initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur’s search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Usability is a multi-dimensional characteristic of a computer system. This paper focuses on usability as a measurement of interaction between the user and the system. The research employs a task-oriented approach to evaluate the usability of a meta search engine. This engine encourages and accepts queries of unlimited size expressed in natural language. A variety of conventional metrics developed by academic and industrial research, including ISO standards,, are applied to the information retrieval process consisting of sequential tasks. Tasks range from formulating (long) queries to interpreting and retaining search results. Results of the evaluation and analysis of the operation log indicate that obtaining advanced search engine results can be accomplished simultaneously with enhancing the usability of the interactive process. In conclusion, we discuss implications for interactive information retrieval system design and directions for future usability research. © 2008 Academy Publisher.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose – The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users. Design/methodology/approach – The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration. Findings – The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified. Research limitations/implications – Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age-related sites is a direction that is currently being exploring. Practical implications – This research is background work that can be incorporated in search engines or other web-based applications, to help marketing companies and advertisers. Originality/value – This research enhances the current state of knowledge in short-text classification and query log learning. Classification schemes, Computer networks, Information retrieval, Man-machine systems, User interfaces

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term- based ones in describing user preferences, but many experiments do not support this hypothesis. This research presents a promising method, Relevance Feature Discovery (RFD), for solving this challenging issue. It discovers both positive and negative patterns in text documents as high-level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the high-level features. The thesis also introduces an adaptive model (called ARFD) to enhance the exibility of using RFD in adaptive environment. ARFD automatically updates the system's knowledge based on a sliding window over new incoming feedback documents. It can efficiently decide which incoming documents can bring in new knowledge into the system. Substantial experiments using the proposed models on Reuters Corpus Volume 1 and TREC topics show that the proposed models significantly outperform both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and other pattern-based methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While the phrase “six degrees of separation” is widely used to characterize a variety of humanderived networks, in this study we show that in patent citation network, related patents are connected with an average distance of 6, whereas an average distance for a random pair of nodes in the graph is approximately 15. We use this information to improve the recall level in prior-art retrieval in the setting of blind relevance feedback without any textual knowledge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantum theory has recently been employed to further advance the theory of information retrieval (IR). A challenging research topic is to investigate the so called quantum-like interference in users’ relevance judgement process, where users are involved to judge the relevance degree of each document with respect to a given query. In this process, users’ relevance judgement for the current document is often interfered by the judgement for previous documents, due to the interference on users’ cognitive status. Research from cognitive science has demonstrated some initial evidence of quantum-like cognitive interference in human decision making, which underpins the user’s relevance judgement process. This motivates us to model such cognitive interference in the relevance judgement process, which in our belief will lead to a better modeling and explanation of user behaviors in relevance judgement process for IR and eventually lead to more user-centric IR models. In this paper, we propose to use probabilistic automaton(PA) and quantum finite automaton (QFA), which are suitable to represent the transition of user judgement states, to dynamically model the cognitive interference when the user is judging a list of documents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many user studies in Web information searching have found the significant effect of task types on search strategies. However, little attention was given to Web image searching strategies, especially the query reformulation activity despite that this is a crucial part in Web image searching. In this study, we investigated the effects of topic domains and task types on user’s image searching behavior and query reformulation strategies. Some significant differences in user’s tasks specificity and initial concepts were identified among the task domains. Task types are also found to influence participant’s result reviewing behavior and query reformulation strategies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In information retrieval (IR) research, more and more focus has been placed on optimizing a query language model by detecting and estimating the dependencies between the query and the observed terms occurring in the selected relevance feedback documents. In this paper, we propose a novel Aspect Language Modeling framework featuring term association acquisition, document segmentation, query decomposition, and an Aspect Model (AM) for parameter optimization. Through the proposed framework, we advance the theory and practice of applying high-order and context-sensitive term relationships to IR. We first decompose a query into subsets of query terms. Then we segment the relevance feedback documents into chunks using multiple sliding windows. Finally we discover the higher order term associations, that is, the terms in these chunks with high degree of association to the subsets of the query. In this process, we adopt an approach by combining the AM with the Association Rule (AR) mining. In our approach, the AM not only considers the subsets of a query as “hidden” states and estimates their prior distributions, but also evaluates the dependencies between the subsets of a query and the observed terms extracted from the chunks of feedback documents. The AR provides a reasonable initial estimation of the high-order term associations by discovering the associated rules from the document chunks. Experimental results on various TREC collections verify the effectiveness of our approach, which significantly outperforms a baseline language model and two state-of-the-art query language models namely the Relevance Model and the Information Flow model

Relevância:

100.00% 100.00%

Publicador:

Resumo:

At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the “translation” using cross-lingual document name triangulation performs very well. The evaluation shows encouraging results for our system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Consider the concept combination ‘pet human’. In word association experiments, human subjects produce the associate ‘slave’ in relation to this combination. The striking aspect of this associate is that it is not produced as an associate of ‘pet’, or ‘human’ in isolation. In other words, the associate ‘slave’ seems to be emergent. Such emergent associations sometimes have a creative character and cognitive science is largely silent about how we produce them. Departing from a dimensional model of human conceptual space, this article will explore concept combinations, and will argue that emergent associations are a result of abductive reasoning within conceptual space, that is, below the symbolic level of cognition. A tensor-based approach is used to model concept combinations allowing such combinations to be formalized as interacting quantum systems. Free association norm data is used to motivate the underlying basis of the conceptual space. It is shown by analogy how some concept combinations may behave like quantum-entangled (non-separable) particles. Two methods of analysis were presented for empirically validating the presence of non-separable concept combinations in human cognition. One method is based on quantum theory and another based on comparing a joint (true theoretic) probability distribution with another distribution based on a separability assumption using a chi-square goodness-of-fit test. Although these methods were inconclusive in relation to an empirical study of bi-ambiguous concept combinations, avenues for further refinement of these methods are identified.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electronic services are a leitmotif in ‘hot’ topics like Software as a Service, Service Oriented Architecture (SOA), Service oriented Computing, Cloud Computing, application markets and smart devices. We propose to consider these in what has been termed the Service Ecosystem (SES). The SES encompasses all levels of electronic services and their interaction, with human consumption and initiation on its periphery in much the same way the ‘Web’ describes a plethora of technologies that eventuate to connect information and expose it to humans. Presently, the SES is heterogeneous, fragmented and confined to semi-closed systems. A key issue hampering the emergence of an integrated SES is Service Discovery (SD). A SES will be dynamic with areas of structured and unstructured information within which service providers and ‘lay’ human consumers interact; until now the two are disjointed, e.g., SOA-enabled organisations, industries and domains are choreographed by domain experts or ‘hard-wired’ to smart device application markets and web applications. In a SES, services are accessible, comparable and exchangeable to human consumers closing the gap to the providers. This requires a new SD with which humans can discover services transparently and effectively without special knowledge or training. We propose two modes of discovery, directed search following an agenda and explorative search, which speculatively expands knowledge of an area of interest by means of categories. Inspired by conceptual space theory from cognitive science, we propose to implement the modes of discovery using concepts to map a lay consumer’s service need to terminologically sophisticated descriptions of services. To this end, we reframe SD as an information retrieval task on the information attached to services, such as, descriptions, reviews, documentation and web sites - the Service Information Shadow. The Semantic Space model transforms the shadow's unstructured semantic information into a geometric, concept-like representation. We introduce an improved and extended Semantic Space including categorization calling it the Semantic Service Discovery model. We evaluate our model with a highly relevant, service related corpus simulating a Service Information Shadow including manually constructed complex service agendas, as well as manual groupings of services. We compare our model against state-of-the-art information retrieval systems and clustering algorithms. By means of an extensive series of empirical evaluations, we establish optimal parameter settings for the semantic space model. The evaluations demonstrate the model’s effectiveness for SD in terms of retrieval precision over state-of-the-art information retrieval models (directed search) and the meaningful, automatic categorization of service related information, which shows potential to form the basis of a useful, cognitively motivated map of the SES for exploratory search.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is a big challenge to clearly identify the boundary between positive and negative streams. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on RCV1, and substantial experiments show that the proposed approach achieves encouraging performance.