Biblioteca Digital

51 resultados para Query paging

Clustering-based Query Routing in Cooperative Semi-structured Peer to Peer Networks

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of resource selection in clustered Peer-to-Peer Information Retrieval (P2P IR) networks with cooperative peers. The clustered P2P IR framework presents a significant departure from general P2P IR architectures by employing clustering to ensure content coherence between resources at the resource selection layer, without disturbing document allocation. We propose that such a property could be leveraged in resource selection by adapting well-studied and popular inverted lists for centralized document retrieval. Accordingly, we propose the Inverted PeerCluster Index (IPI), an approach that adapts the inverted lists, in a straightforward manner, for resource selection in clustered P2P IR. IPI also encompasses a strikingly simple peer-specific scoring mechanism that exploits the said index for resource selection. Through an extensive empirical analysis on P2P IR testbeds, we establish that IPI competes well with the sophisticated state-of-the-art methods in virtually every parameter of interest for the resource selection task, in the context of clustered P2P IR.

Mapping cosmopolis: moral topographies of the medieval city

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cosmopolis is a concept that has a long history in many cultures around the globe. It is a mirroring of the 'social' and 'natural' worlds, such that in one is seen the order and the structures of the other -- a mutual 'mapping'. In this paper I examine how the presence of cosmopolis -- a Christianised cosmopolis of the European Middle Ages -- was made evident in the representation and formation of cities at that time. I reveal a dualism between the social and spatial ordering of both city and cosmos which defined and reinforced social and spatial boundaries in urban landscapes, evident for example in the 11th and 12th centuries. Recently, Toulmin (1992) has taken the idea of cosmopolis to argue that it has been a persistent presence in Western - Enlightenment science, philosophy, and religion -- a 'hidden agenda of modernity'. I contend that, as an idea, cosmopolis has a much earlier circulation in European thinking, not least in the Middle Ages. Locating cosmopolis in the medieval and the modern periods then begs a question of what is it that really makes the two distinct and separate? All too often human geographers have emphasised discontinuities between the 'medieval' and 'modern' age, locating the 'rise of modernity' some time in the Enlightenment period. However, what 'mapping' cosmopolis reveals are continuities, binding time and space together, which when looked at begin to help query the modernity concept itself.

Application of connectivity mapping in predictive toxicology based on gene-expression similarity.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Connectivity mapping is the process of establishing connections between different biological states using gene-expression profiles or signatures. There are a number of applications but in toxicology the most pertinent is for understanding mechanisms of toxicity. In its essence the process involves comparing a query gene signature generated as a result of exposure of a biological system to a chemical to those in a database that have been previously derived. In the ideal situation the query gene-expression signature is characteristic of the event and will be matched to similar events in the database. Key criteria are therefore the means of choosing the signature to be matched and the means by which the match is made. In this article we explore these concepts with examples applicable to toxicology.

Visually extracting data records from the deep web

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Web sites that rely on databases for their content are now ubiquitous. Query result pages are dynamically generated from these databases in response to user-submitted queries. Automatically extracting structured data from query result pages is a challenging problem, as the structure of the data is not explicitly represented. While humans have shown good intuition in visually understanding data records on a query result page as displayed by a web browser, no existing approach to data record extraction has made full use of this intuition. We propose a novel approach, in which we make use of the common sources of evidence that humans use to understand data records on a displayed query result page. These include structural regularity, and visual and content similarity between data records displayed on a query result page. Based on these observations we propose new techniques that can identify each data record individually, while ignoring noise items, such as navigation bars and adverts. We have implemented these techniques in a software prototype, rExtractor, and tested it using two datasets. Our experimental results show that our approach achieves significantly higher accuracy than previous approaches. Furthermore, it establishes the case for use of vision-based algorithms in the context of data extraction from web sites.

Race, space and politics in Mid-Victorian Ireland: the ethnologies of Abraham Hume and John McElheran

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There has been much scholarly debate about the significance and influence of racialist thinking in the political and cultural history of nineteenth-century Ireland. With reference to that ongoing historiographical discussion, this paper considers the racial geographies and opposing political motivations of two Irish ethnologists, Abraham Hume and John McElheran, using their racialist regimes to query some of the common assumptions that have informed disagreements over the role and reach of racial typecasting in mid-nineteenth-century Ireland. As well as examining in detail the racial imaginaries promulgated by Hume and McElheran, the paper also argues for the importance of situating racialist discourse in the spaces in which it was communicated and contested. Further, in highlighting the ways in which Hume and McElheran collapsed together race, class and religion, the paper troubles the utility of a crisp analytical distinction between those disputed categories.

Automatically Annotating Structured Web Data Using a SVM-Based Multiclass Classifier

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we propose a new learning approach to Web data annotation, where a support vector machine-based multiclass classifier is trained to assign labels to data items. For data record extraction, a data section re-segmentation algorithm based on visual and content features is introduced to improve the performance of Web data record extraction. We have implemented the proposed approach and tested it with a large set of Web query result pages in different domains. Our experimental results show that our proposed approach is highly effective and efficient.

Practical Spectrum Aggregation for Secondary Networks with Imperfect Sensing

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We investigate a collision-sensitive secondary network that intends to opportunistically aggregate and utilize spectrum of a primary network to achieve higher data rates. In opportunistic spectrum access with imperfect sensing of idle primary spectrum, secondary transmission can collide with primary transmission. When the secondary network aggregates more channels in the presence of the imperfect sensing, collisions could occur more often, limiting the performance obtained by spectrum aggregation. In this context, we aim to address a fundamental query, that is, how much spectrum aggregation is worthy with imperfect sensing. For collision occurrence, we focus on two different types of collision: one is imposed by asynchronous transmission; and the other by imperfect spectrum sensing. The collision probability expression has been derived in closed-form with various secondary network parameters: primary traffic load, secondary user transmission parameters, spectrum sensing errors, and the number of aggregated sub-channels. In addition, the impact of spectrum aggregation on data rate is analysed under the constraint of collision probability. Then, we solve an optimal spectrum aggregation problem and propose the dynamic spectrum aggregation approach to increase the data rate subject to practical collision constraints. Our simulation results show clearly that the proposed approach outperforms the benchmark that passively aggregates sub-channels with lack of collision awareness.

Retrieving Regions of Interest for User Exploration

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We consider an application scenario where points of interest (PoIs) each have a web presence and where a web user wants to iden- tify a region that contains relevant PoIs that are relevant to a set of keywords, e.g., in preparation for deciding where to go to conve- niently explore the PoIs. Motivated by this, we propose the length- constrained maximum-sum region (LCMSR) query that returns a spatial-network region that is located within a general region of in- terest, that does not exceed a given size constraint, and that best matches query keywords. Such a query maximizes the total weight of the PoIs in it w.r.t. the query keywords. We show that it is NP- hard to answer this query. We develop an approximation algorithm with a (5 + ǫ) approximation ratio utilizing a technique that scales node weights into integers. We also propose a more efficient heuris- tic algorithm and a greedy algorithm. Empirical studies on real data offer detailed insight into the accuracy of the proposed algorithms and show that the proposed algorithms are capable of computingresults efficiently and effectively.

Efficient Processing of Spatial Group Keyword Queries

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the proliferation of geo-positioning and geo-tagging techniques, spatio-textual objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group together satisfy a query.

We define the problem of retrieving a group of spatio-textual objects such that the group's keywords cover the query's keywords and such that the objects are nearest to the query location and have the smallest inter-object distances. Specifically, we study three instantiations of this problem, all of which are NP-hard. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. In addition, we solve the problems of retrieving top-k groups of three instantiations, and study a weighted version of the problem that incorporates object weights. We present empirical studies that offer insight into the efficiency of the solutions, as well as the accuracy of the approximate solutions.

Temporal Spatial-Keyword Top-k publish/subscribe

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. These geo-textual data cover a wide range of topics. Users are interested in receiving up-to-date tweets such that their locations are close to a user specified location and their texts are interesting to users. For example, a user may want to be updated with tweets near her home on the topic “food poisoning vomiting.” We consider the Temporal Spatial-Keyword Top-k Subscription (TaSK) query. Given a TaSK query, we continuously maintain up-to-date top-k most relevant results over a stream of geo-textual objects (e.g., geo-tagged Tweets) for the query. The TaSK query takes into account text relevance, spatial proximity, and recency of geo-textual objects in evaluating its relevance with a geo-textual object. We propose a novel solution to efficiently process a large number of TaSK queries over a stream of geotextual objects. We evaluate the efficiency of our approach on two real-world datasets and the experimental results show that our solution is able to achieve a reduction of the processing time by 70-80% compared with two baselines.

Efficient RkNN Retrieval with Arbitrary Non-Metric Similarity Measures

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A RkNN query returns all objects whose nearest k neighbors
contain the query object. In this paper, we consider RkNN
query processing in the case where the distances between
attribute values are not necessarily metric. Dissimilarities
between objects could then be a monotonic aggregate of dissimilarities
between their values, such aggregation functions
being specified at query time. We outline real world cases
that motivate RkNN processing in such scenarios. We consider
the AL-Tree index and its applicability in RkNN query
processing. We develop an approach that exploits the group
level reasoning enabled by the AL-Tree in RkNN processing.
We evaluate our approach against a Naive approach
that performs sequential scans on contiguous data and an
improved block-based approach that we provide. We use
real-world datasets and synthetic data with varying characteristics
for our experiments. This extensive empirical
evaluation shows that our approach is better than existing
methods in terms of computational and disk access costs,
leading to significantly better response times.

Fast Rule Mining over Multi-dimensional Windows

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Association rule mining is an indispensable tool for discovering
insights from large databases and data warehouses.
The data in a warehouse being multi-dimensional, it is often
useful to mine rules over subsets of data defined by selections
over the dimensions. Such interactive rule mining
over multi-dimensional query windows is difficult since rule
mining is computationally expensive. Current methods using
pre-computation of frequent itemsets require counting
of some itemsets by revisiting the transaction database at
query time, which is very expensive. We develop a method
(RMW) that identifies the minimal set of itemsets to compute
and store for each cell, so that rule mining over any
query window may be performed without going back to the
transaction database. We give formal proofs that the set of
itemsets chosen by RMW is sufficient to answer any query
and also prove that it is the optimal set to be computed
for 1 dimensional queries. We demonstrate through an extensive
empirical evaluation that RMW achieves extremely
fast query response time compared to existing methods, with
only moderate overhead in pre-computation and storage

Fast Mining of Interesting Phrases from Subsets of Text Corpora

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We address the problem of mining interesting phrases from subsets of a text corpus where the subset is specified using a set of features such as keywords that form a query. Previous algorithms for the problem have proposed solutions that involve sifting through a phrase dictionary based index or a document-based index where the solution is linear in either the phrase dictionary size or the size of the document subset. We propose the usage of an independence assumption between query keywords given the top correlated phrases, wherein the pre-processing could be reduced to discovering phrases from among the top phrases per each feature in the query. We then outline an indexing mechanism where per-keyword phrase lists are stored either in disk or memory, so that popular aggregation algorithms such as No Random Access and Sort-merge Join may be adapted to do the scoring at real-time to identify the top interesting phrases. Though such an approach is expected to be approximate, we empirically illustrate that very high accuracies (of over 90%) are achieved against the results of exact algorithms. Due to the simplified list-aggregation, we are also able to provide response times that are orders of magnitude better than state-of-the-art algorithms. Interestingly, our disk-based approach outperforms the in-memory baselines by up to hundred times and sometimes more, confirming the superiority of the proposed method.

Finding Relevant Tweets

Relevância:

10.00% 10.00%

Publicador:

Resumo:

When a user of a microblogging site authors a microblog
post or browses through a microblog post, it provides cues as to what
topic she is interested in at that point in time. Example-based search
that retrieves similar tweets given one exemplary tweet, such as the one
just authored, can help provide the user with relevant content. We investigate
various components of microblog posts, such as the associated
timestamp, author’s social network, and the content of the post, and
develop approaches that harness such factors in finding relevant tweets
given a query tweet. An empirical analysis of such techniques on real
world twitter-data is then presented to quantify the utility of the various
factors in assessing tweet relevance. We observe that content-wise similar
tweets that also contain extra information not already present in the
query, are perceived as useful. We then develop a composite technique
that combines the various approaches by scoring tweets using a dynamic
query-specific linear combination of separate techniques. An empirical
evaluation establishes the effectiveness of the composite technique, and
that it outperforms each of its constituents.

Retrieving Similar Discussion Forum Threads: A Structure based Approach

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Online forums are becoming a popular way of finding useful
information on the web. Search over forums for existing discussion
threads so far is limited to keyword-based search due
to the minimal effort required on part of the users. However,
it is often not possible to capture all the relevant context in a
complex query using a small number of keywords. Examplebased
search that retrieves similar discussion threads given
one exemplary thread is an alternate approach that can help
the user provide richer context and vastly improve forum
search results. In this paper, we address the problem of
finding similar threads to a given thread. Towards this, we
propose a novel methodology to estimate similarity between
discussion threads. Our method exploits the thread structure
to decompose threads in to set of weighted overlapping
components. It then estimates pairwise thread similarities
by quantifying how well the information in the threads are
mutually contained within each other using lexical similarities
between their underlying components. We compare our
proposed methods on real datasets against state-of-the-art
thread retrieval mechanisms wherein we illustrate that our
techniques outperform others by large margins on popular
retrieval evaluation measures such as NDCG, MAP, Precision@k
and MRR. In particular, consistent improvements of
up to 10% are observed on all evaluation measures

«
1
2
3
4
»