969 resultados para Query suggestion
Resumo:
A user’s query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques ignore information about the dependencies that exist between words in natural language. However, more recent approaches have demonstrated that by explicitly modeling associations between terms significant improvements in retrieval effectiveness can be achieved over those that ignore these dependencies. State-of-the-art dependency-based approaches have been shown to primarily model syntagmatic associations. Syntagmatic associations infer a likelihood that two terms co-occur more often than by chance. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
Resumo:
Many successful query expansion techniques ignore information about the term dependencies that exist within natural language. However, researchers have recently demonstrated that consistent and significant improvements in retrieval effectiveness can be achieved by explicitly modelling term dependencies within the query expansion process. This has created an increased interest in dependency-based models. State-of-the-art dependency-based approaches primarily model term associations known within structural linguistics as syntagmatic associations, which are formed when terms co-occur together more often than by chance. However, structural linguistics proposes that the meaning of a word is also dependent on its paradigmatic associations, which are formed between words that can substitute for each other without effecting the acceptability of a sentence. Given the reliance on word meanings when a user formulates their query, our approach takes the novel step of modelling both syntagmatic and paradigmatic associations within the query expansion process based on the (pseudo) relevant documents returned in web search. The results demonstrate that this approach can provide significant improvements in web re- trieval effectiveness when compared to a strong benchmark retrieval system.
Resumo:
In most intent recognition studies, annotations of query intent are created post hoc by external assessors who are not the searchers themselves. It is important for the field to get a better understanding of the quality of this process as an approximation for determining the searcher's actual intent. Some studies have investigated the reliability of the query intent annotation process by measuring the interassessor agreement. However, these studies did not measure the validity of the judgments, that is, to what extent the annotations match the searcher's actual intent. In this study, we asked both the searchers themselves and external assessors to classify queries using the same intent classification scheme. We show that of the seven dimensions in our intent classification scheme, four can reliably be used for query annotation. Of these four, only the annotations on the topic and spatial sensitivity dimension are valid when compared with the searcher's annotations. The difference between the interassessor agreement and the assessor-searcher agreement was significant on all dimensions, showing that the agreement between external assessors is not a good estimator of the validity of the intent classifications. Therefore, we encourage the research community to consider using query intent classifications by the searchers themselves as test data.
Resumo:
This paper presents a novel framework to further advance the recent trend of using query decomposition and high-order term relationships in query language modeling, which takes into account terms implicitly associated with different subsets of query terms. Existing approaches, most remarkably the language model based on the Information Flow method are however unable to capture multiple levels of associations and also suffer from a high computational overhead. In this paper, we propose to compute association rules from pseudo feedback documents that are segmented into variable length chunks via multiple sliding windows of different sizes. Extensive experiments have been conducted on various TREC collections and our approach significantly outperforms a baseline Query Likelihood language model, the Relevance Model and the Information Flow model.
Resumo:
Market operators in New Zealand and Australia, such as the New Zealand Exchange (NZX) and the Australian Securities Exchange (ASX), have the regulatory power in their listing rules to issue queries to their market participants to explain unusual fluctuations in trading price and/or volume in the market. The operator will issue a price query where it believes that the market has not been fully informed as to price relevant information. Responsive regulation theory has informed much of the regulatory debate in securities laws in the region. Price queries map onto the lower level of the enforcement pyramid envisaged by responsive regulation and are one strategy that a market operator can use in communicating its compliance expectations to its stakeholders. The issue of a price query may be a precursor to more severe enforcement activities. The aim of this study is to investigate whether increased use of price queries by the securities market operator in New Zealand corresponded with an increase in disclosure frequency by all participating companies. The study finds that an increased use of price queries did correspond with an increase in disclosure frequency. A possible explanation for this finding is that price queries are an effective means of appealing to the factors that motivate corporations, and the individuals who control them, to comply with the law and regulatory requirements. This finding will have implications for both the NZX and the ASX as well as for regulators and policy makers generally.
Resumo:
Novelty-biased cumulative gain (α-NDCG) has become the de facto measure within the information retrieval (IR) community for evaluating retrieval systems in the context of sub-topic retrieval. Setting the incorrect value of parameter α in α-NDCG prevents the measure from behaving as desired in particular circumstances. In fact, when α is set according to common practice (i.e. α = 0.5), the measure favours systems that promote redundant relevant sub-topics rather than provide novel relevant ones. Recognising this characteristic of the measure is important because it affects the comparison and the ranking of retrieval systems. We propose an approach to overcome this problem by defining a safe threshold for the value of α on a query basis. Moreover, we study its impact on system rankings through a comprehensive simulation.
Resumo:
This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning algorithms. A key component of an active learning approach is the query strategy, which is used to iteratively select samples for annotation. Knowledge resources have been used in information extraction as a means to derive additional features for sample representation. DKI is, however, the first query strategy that exploits such resources to inform sample selection. To evaluate the merits of DKI, in particular with respect to the reduction in annotation effort that the new query strategy allows to achieve, we conduct a comprehensive empirical comparison of active learning query strategies for information extraction within the clinical domain. The clinical domain was chosen for this work because of the availability of extensive structured knowledge resources which have often been exploited for feature generation. In addition, the clinical domain offers a compelling use case for active learning because of the necessary high costs and hurdles associated with obtaining annotations in this domain. Our experimental findings demonstrated that 1) amongst existing query strategies, the ones based on the classification model’s confidence are a better choice for clinical data as they perform equally well with a much lighter computational load, and 2) significant reductions in annotation effort are achievable by exploiting knowledge resources within active learning query strategies, with up to 14% less tokens and concepts to manually annotate than with state-of-the-art query strategies.
Resumo:
In this paper, we study the Einstein relation for the diffusivity to mobility ratio (DMR) in n-channel inversion layers of non-linear optical materials on the basis of a newly formulated electron dispersion relation by considering their special properties within the frame work of k.p formalism. The results for the n-channel inversion layers of III-V, ternary and quaternary materials form a special case of our generalized analysis. The DMR for n-channel inversion layers of II-VI, IV-VI and stressed materials has been investigated by formulating the respective 2D electron dispersion laws. It has been found, taking n-channel inversion layers of CdGeAs2, Cd(3)AS(2), InAs, InSb, Hg1-xCdxTe, In1-xGaxAsyP1-y lattice matched to InP, CdS, PbTe, PbSnTe, Pb1-xSnxSe and stressed InSb as examples, that the DMR increases with the increasing surface electric field with different numerical values and the nature of the variations are totally band structure dependent. The well-known expression of the DMR for wide gap materials has been obtained as a special case under certain limiting conditions and this compatibility is an indirect test for our generalized formalism. Besides, an experimental method of determining the 2D DMR for n-channel inversion layers having arbitrary dispersion laws has been suggested.
Resumo:
This paper describes the design and implementation of a high-level query language called Generalized Query-By-Rule (GQBR) which supports retrieval, insertion, deletion and update operations. This language, based on the formalism of database logic, enables the users to access each database in a distributed heterogeneous environment, without having to learn all the different data manipulation languages. The compiler has been implemented on a DEC 1090 system in Pascal.
Resumo:
Database management systems offer a very reliable and attractive data organization for fast and economical information storage and processing for diverse applications. It is much more important that the information should be easily accessible to users with varied backgrounds, professional as well as casual, through a suitable data sublanguage. The language adopted here (APPLE) is one such language for relational database systems and is completely nonprocedural and well suited to users with minimum or no programming background. This is supported by an access path model which permits the user to formulate completely nonprocedural queries expressed solely in terms of attribute names. The data description language (DDL) and data manipulation language (DML) features of APPLE are also discussed. The underlying relational database has been implemented with the help of the DATATRIEVE-11 utility for record and domain definition which is available on the PDP-11/35. The package is coded in Pascal and MACRO-11. Further, most of the limitations of the DATATRIEVE-11 utility have been eliminated in the interface package.
Resumo:
Digital image