54 resultados para Query reformulation


Relevância:

10.00% 10.00%

Publicador:

Resumo:

We address the problem of mining interesting phrases from subsets of a text corpus where the subset is specified using a set of features such as keywords that form a query. Previous algorithms for the problem have proposed solutions that involve sifting through a phrase dictionary based index or a document-based index where the solution is linear in either the phrase dictionary size or the size of the document subset. We propose the usage of an independence assumption between query keywords given the top correlated phrases, wherein the pre-processing could be reduced to discovering phrases from among the top phrases per each feature in the query. We then outline an indexing mechanism where per-keyword phrase lists are stored either in disk or memory, so that popular aggregation algorithms such as No Random Access and Sort-merge Join may be adapted to do the scoring at real-time to identify the top interesting phrases. Though such an approach is expected to be approximate, we empirically illustrate that very high accuracies (of over 90%) are achieved against the results of exact algorithms. Due to the simplified list-aggregation, we are also able to provide response times that are orders of magnitude better than state-of-the-art algorithms. Interestingly, our disk-based approach outperforms the in-memory baselines by up to hundred times and sometimes more, confirming the superiority of the proposed method.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

When a user of a microblogging site authors a microblog
post or browses through a microblog post, it provides cues as to what
topic she is interested in at that point in time. Example-based search
that retrieves similar tweets given one exemplary tweet, such as the one
just authored, can help provide the user with relevant content. We investigate
various components of microblog posts, such as the associated
timestamp, author’s social network, and the content of the post, and
develop approaches that harness such factors in finding relevant tweets
given a query tweet. An empirical analysis of such techniques on real
world twitter-data is then presented to quantify the utility of the various
factors in assessing tweet relevance. We observe that content-wise similar
tweets that also contain extra information not already present in the
query, are perceived as useful. We then develop a composite technique
that combines the various approaches by scoring tweets using a dynamic
query-specific linear combination of separate techniques. An empirical
evaluation establishes the effectiveness of the composite technique, and
that it outperforms each of its constituents.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Online forums are becoming a popular way of finding useful
information on the web. Search over forums for existing discussion
threads so far is limited to keyword-based search due
to the minimal effort required on part of the users. However,
it is often not possible to capture all the relevant context in a
complex query using a small number of keywords. Examplebased
search that retrieves similar discussion threads given
one exemplary thread is an alternate approach that can help
the user provide richer context and vastly improve forum
search results. In this paper, we address the problem of
finding similar threads to a given thread. Towards this, we
propose a novel methodology to estimate similarity between
discussion threads. Our method exploits the thread structure
to decompose threads in to set of weighted overlapping
components. It then estimates pairwise thread similarities
by quantifying how well the information in the threads are
mutually contained within each other using lexical similarities
between their underlying components. We compare our
proposed methods on real datasets against state-of-the-art
thread retrieval mechanisms wherein we illustrate that our
techniques outperform others by large margins on popular
retrieval evaluation measures such as NDCG, MAP, Precision@k
and MRR. In particular, consistent improvements of
up to 10% are observed on all evaluation measures

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We consider the problem of linking web search queries to entities from a knowledge base such as Wikipedia. Such linking enables converting a user’s web search session to a footprint in the knowledge base that could be used to enrich the user profile. Traditional methods for entity linking have been directed towards finding entity mentions in text documents such as news reports, each of which are possibly linked to multiple entities enabling the usage of measures like entity set coherence. Since web search queries are very small text fragments, such criteria that rely on existence of a multitude of mentions do not work too well on them. We propose a three-phase method for linking web search queries to wikipedia entities. The first phase does IR-style scoring of entities against the search query to narrow down to a subset of entities that are expanded using hyperlink information in the second phase to a larger set. Lastly, we use a graph traversal approach to identify the top entities to link the query to. Through an empirical evaluation on real-world web search queries, we illustrate that our methods significantly enhance the linking accuracy over state-of-the-art methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. These geo-textual data cover a wide range of topics. Users are interested in receiving up-to-date geo-textual objects (e.g., geo-tagged Tweets) such that their locations meet users’ need and their texts are interesting to users. For example, a user may want to be updated with tweets near her home on the topic “dengue fever headache.” In this demonstration, we present SOPS, the Spatial-Keyword Publish/Subscribe System, that is capable of efficiently processing spatial keyword continuous queries. SOPS supports two types of queries: (1) Boolean Range Continuous (BRC) query that can be used to subscribe the geo-textual objects satisfying a boolean keyword expression and falling in a specified spatial region; (2) Temporal Spatial-Keyword Top-k Continuous (TaSK) query that continuously maintains up-to-date top-k most relevant results over a stream of geo-textual objects. SOPS enables users to formulate their queries and view the real-time results over a stream of geotextual objects by browser-based user interfaces. On the server side, we propose solutions to efficiently processing a large number of BRC queries (tens of millions) and TaSK queries over a stream of geo-textual objects.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The complexity of modern SCADA networks and their associated cyber-attacks requires an expressive but flexible manner for representing both domain knowledge and collected intrusion alerts with the ability to integrate them for enhanced analytical capabilities and better understanding of attacks. This paper proposes an ontology-based approach for contextualized intrusion alerts in SCADA networks. In this approach, three security ontologies were developed to represent and store information on intrusion alerts, Modbus communications, and Modbus attack descriptions. This information is correlated into enriched intrusion alerts using simple ontology logic rules written in Semantic Query-Enhanced Web Rules (SQWRL). The contextualized alerts give analysts the means to better understand evolving attacks and to uncover the semantic relationships between sequences of individual attack events. The proposed system is illustrated by two use case scenarios.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The past decade had witnessed an unprecedented growth in the amount of available digital content, and its volume is expected to continue to grow the next few years. Unstructured text data generated from web and enterprise sources form a large fraction of such content. Many of these contain large volumes of reusable data such as solutions to frequently occurring problems, and general know-how that may be reused in appropriate contexts. In this work, we address issues around leveraging unstructured text data from sources as diverse as the web and the enterprise within the Case-based Reasoning framework. Case-based Reasoning (CBR) provides a framework and methodology for systematic reuse of historical knowledge that is available in the form of problemsolution
pairs, in solving new problems. Here, we consider possibilities of enhancing Textual CBR systems under three main themes: procurement, maintenance and retrieval. We adapt and build upon the stateof-the-art techniques from data mining and natural language processing in addressing various challenges therein. Under procurement, we investigate the problem of extracting cases (i.e., problem-solution pairs) from data sources such as incident/experience
reports. We develop case-base maintenance methods specifically tuned to text targeted towards retaining solutions such that the utility of the filtered case base in solving new problems is maximized. Further, we address the problem of query suggestions for textual case-bases and show that exploiting the problem-solution partition can enhance retrieval effectiveness by prioritizing more useful query suggestions. Additionally, we illustrate interpretable clustering as a tool to drill-down to domain specific text collections (since CBR systems are usually very domain specific) and develop techniques for improved similarity assessment in social media sources such as microblogs. Through extensive empirical evaluations, we illustrate the improvements that we are able to
achieve over the state-of-the-art methods for the respective tasks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Gene expression connectivity mapping has gained much popularity recently with a number of successful applications in biomedical research testifying its utility and promise. Previously methodological research in connectivity mapping mainly focused on two of the key components in the framework, namely, the reference gene expression profiles and the connectivity mapping algorithms. The other key component in this framework, the query gene signature, has been left to users to construct without much consensus on how this should be done, albeit it has been an issue most relevant to end users. As a key input to the connectivity mapping process, gene signature is crucially important in returning biologically meaningful and relevant results. This paper intends to formulate a standardized procedure for constructing high quality gene signatures from a user’s perspective.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Within the study of domestic violence typological approaches have gained prominence in part as a response to the wider feminist canon that presumes perpetrators are all simply motivated by male power. In this article we use a single case study to query the presumption inherent in the most commonly used typological approaches that offender motivations remain largely static overtime and can be read off easily from self-reports or official records. We conclude by pointing to the need, both for academics and practitioners, to engage interpretively with the specific meanings acts of violence hold for domestic violence perpetrators - informed as they can be by sexist values, perceptions of entitlement and a specific history of conflict, suspicion or grievance – that can change who they are and the way they behave in the aftermath of assaults and breakups, as the foreground of crime is reincorporated into a background narrative.