930 resultados para Similarity queries
Resumo:
Web sites that rely on databases for their content are now ubiquitous. Query result pages are dynamically generated from these databases in response to user-submitted queries. Automatically extracting structured data from query result pages is a challenging problem, as the structure of the data is not explicitly represented. While humans have shown good intuition in visually understanding data records on a query result page as displayed by a web browser, no existing approach to data record extraction has made full use of this intuition. We propose a novel approach, in which we make use of the common sources of evidence that humans use to understand data records on a displayed query result page. These include structural regularity, and visual and content similarity between data records displayed on a query result page. Based on these observations we propose new techniques that can identify each data record individually, while ignoring noise items, such as navigation bars and adverts. We have implemented these techniques in a software prototype, rExtractor, and tested it using two datasets. Our experimental results show that our approach achieves significantly higher accuracy than previous approaches. Furthermore, it establishes the case for use of vision-based algorithms in the context of data extraction from web sites.
Resumo:
Advocates of semi-structured interview techniques have often argued that rapport may be built, and power inequalities between interviewer and respondent counteracted, by strategic self-disclosure on the part of the interviewer. Strategies that use self-disclosure to construct similarity between interviewer and respondent rely on the presumption that the respondent will in fact interpret the interviewer's behaviour in this way. In this article we examine the role of interviewer self-disclosure using data drawn from three projects involving interviews with young people. We consider how an interviewer's attempts to ‘do similarity’ may be interpreted variously as displays of similarity or, ironically, as indicators of difference by the participant, and map the implications that this may have for subsequent interview dialogue. A particular object of concern relates to the ways in which self-disclosing acts may function in the negotiation of category entitlement within interview interactions.
Resumo:
We employ the impulse approximation for a description of positronium-atom scattering. Our analysis and calculations of Ps-Kr and Ps-Ar collisions provide a theoretical explanation of the similarity between the cross sections for positronium scattering and electron scattering for a range of atomic and molecular targets observed by S. J. Brawley et al. [Science 330, 789 (2010)].
Resumo:
With the proliferation of geo-positioning and geo-tagging techniques, spatio-textual objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group together satisfy a query.
We define the problem of retrieving a group of spatio-textual objects such that the group's keywords cover the query's keywords and such that the objects are nearest to the query location and have the smallest inter-object distances. Specifically, we study three instantiations of this problem, all of which are NP-hard. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. In addition, we solve the problems of retrieving top-k groups of three instantiations, and study a weighted version of the problem that incorporates object weights. We present empirical studies that offer insight into the efficiency of the solutions, as well as the accuracy of the approximate solutions.
Resumo:
This book provides a comprehensive tutorial on similarity operators. The authors systematically survey the set of similarity operators, primarily focusing on their semantics, while also touching upon mechanisms for processing them effectively.
The book starts off by providing introductory material on similarity search systems, highlighting the central role of similarity operators in such systems. This is followed by a systematic categorized overview of the variety of similarity operators that have been proposed in literature over the last two decades, including advanced operators such as RkNN, Reverse k-Ranks, Skyline k-Groups and K-N-Match. Since indexing is a core technology in the practical implementation of similarity operators, various indexing mechanisms are summarized. Finally, current research challenges are outlined, so as to enable interested readers to identify potential directions for future investigations.
In summary, this book offers a comprehensive overview of the field of similarity search operators, allowing readers to understand the area of similarity operators as it stands today, and in addition providing them with the background needed to understand recent novel approaches.
Resumo:
A RkNN query returns all objects whose nearest k neighbors
contain the query object. In this paper, we consider RkNN
query processing in the case where the distances between
attribute values are not necessarily metric. Dissimilarities
between objects could then be a monotonic aggregate of dissimilarities
between their values, such aggregation functions
being specified at query time. We outline real world cases
that motivate RkNN processing in such scenarios. We consider
the AL-Tree index and its applicability in RkNN query
processing. We develop an approach that exploits the group
level reasoning enabled by the AL-Tree in RkNN processing.
We evaluate our approach against a Naive approach
that performs sequential scans on contiguous data and an
improved block-based approach that we provide. We use
real-world datasets and synthetic data with varying characteristics
for our experiments. This extensive empirical
evaluation shows that our approach is better than existing
methods in terms of computational and disk access costs,
leading to significantly better response times.
Resumo:
We consider the problem of linking web search queries to entities from a knowledge base such as Wikipedia. Such linking enables converting a user’s web search session to a footprint in the knowledge base that could be used to enrich the user profile. Traditional methods for entity linking have been directed towards finding entity mentions in text documents such as news reports, each of which are possibly linked to multiple entities enabling the usage of measures like entity set coherence. Since web search queries are very small text fragments, such criteria that rely on existence of a multitude of mentions do not work too well on them. We propose a three-phase method for linking web search queries to wikipedia entities. The first phase does IR-style scoring of entities against the search query to narrow down to a subset of entities that are expanded using hyperlink information in the second phase to a larger set. Lastly, we use a graph traversal approach to identify the top entities to link the query to. Through an empirical evaluation on real-world web search queries, we illustrate that our methods significantly enhance the linking accuracy over state-of-the-art methods.