18 resultados para keyword spotting
Resumo:
The preferences of users are important in route search and planning. For example, when a user plans a trip within a city, their preferences can be expressed as keywords shopping mall, restaurant, and museum, with weights 0.5, 0.4, and 0.1, respectively. The resulting route should best satisfy their weighted preferences. In this paper, we take into account the weighted user preferences in route search, and present a keyword coverage problem, which finds an optimal route from a source location to a target location such that the keyword coverage is optimized and that the budget score satisfies a specified constraint. We prove that this problem is NP-hard. To solve this complex problem, we pro- pose an optimal route search based on an A* variant for which we have defined an admissible heuristic function. The experiments conducted on real-world datasets demonstrate both the efficiency and accu- racy of our proposed algorithms.
Resumo:
We address the problem of mining interesting phrases from subsets of a text corpus where the subset is specified using a set of features such as keywords that form a query. Previous algorithms for the problem have proposed solutions that involve sifting through a phrase dictionary based index or a document-based index where the solution is linear in either the phrase dictionary size or the size of the document subset. We propose the usage of an independence assumption between query keywords given the top correlated phrases, wherein the pre-processing could be reduced to discovering phrases from among the top phrases per each feature in the query. We then outline an indexing mechanism where per-keyword phrase lists are stored either in disk or memory, so that popular aggregation algorithms such as No Random Access and Sort-merge Join may be adapted to do the scoring at real-time to identify the top interesting phrases. Though such an approach is expected to be approximate, we empirically illustrate that very high accuracies (of over 90%) are achieved against the results of exact algorithms. Due to the simplified list-aggregation, we are also able to provide response times that are orders of magnitude better than state-of-the-art algorithms. Interestingly, our disk-based approach outperforms the in-memory baselines by up to hundred times and sometimes more, confirming the superiority of the proposed method.
Resumo:
Online forums are becoming a popular way of finding useful
information on the web. Search over forums for existing discussion
threads so far is limited to keyword-based search due
to the minimal effort required on part of the users. However,
it is often not possible to capture all the relevant context in a
complex query using a small number of keywords. Examplebased
search that retrieves similar discussion threads given
one exemplary thread is an alternate approach that can help
the user provide richer context and vastly improve forum
search results. In this paper, we address the problem of
finding similar threads to a given thread. Towards this, we
propose a novel methodology to estimate similarity between
discussion threads. Our method exploits the thread structure
to decompose threads in to set of weighted overlapping
components. It then estimates pairwise thread similarities
by quantifying how well the information in the threads are
mutually contained within each other using lexical similarities
between their underlying components. We compare our
proposed methods on real datasets against state-of-the-art
thread retrieval mechanisms wherein we illustrate that our
techniques outperform others by large margins on popular
retrieval evaluation measures such as NDCG, MAP, Precision@k
and MRR. In particular, consistent improvements of
up to 10% are observed on all evaluation measures