Biblioteca Digital

997 resultados para Index Pruning

Indexing without spam

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The presence of spam in a document ranking is a major issue for Web search engines. Common approaches that cope with spam remove from the document rankings those pages that are likely to contain spam. These approaches are implemented as post-retrieval processes, that filter out spam pages only after documents have been retrieved with respect to a user’s query. In this paper we suggest to remove spam pages at indexing time, therefore obtaining a pruned index that is virtually “spam-free”. We investigate the benefits of this approach from three points of view: indexing time, index size, and retrieval performances. Not surprisingly, we found that the strategy decreases both the time required by the indexing process and the space required for storing the index. Surprisingly instead, we found that by considering a spam-pruned version of a collection’s index, no difference in retrieval performance is found when compared to that obtained by traditional post-retrieval spam filtering approaches.

Reducing the bandwidth requirements of P2P keyword indexing

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes the design and evaluation of a federated, peer-to-peer indexing system, which can be used to integrate the resources of local systems into a globally addressable index using a distributed hash table. The salient feature of the indexing systems design is the efficient dissemination of term-document indices using a combination of duplicate elimination, leaf set forwarding and conventional techniques such as aggressive index pruning, index compression, and batching. Together these indexing strategies help to reduce the number of RPC operations required to locate the nodes responsible for a section of the index, as well as the bandwidth utilization and the latency of the indexing service. Using empirical observation we evaluate the performance benefits of these cumulative optimizations and show that these design trade-offs can significantly improve indexing performance when using a distributed hash table.

Reducing the bandwidth requirements of P2P keyword indexing

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes the design and evaluation of a peer-to-peer indexing system to integrate the resources of local document database systems into a globally addressable index using a distributed hash table. The salient feature of the indexing systems design is the efficient dissemination of term-document indices using a combination of duplicate elimination, ring based forwarding and conventional techniques such as aggressive index pruning, and batching. Together these indexing strategies help to reduce, the number of RPC operations required to locate the nodes responsible for a section of the index, the bandwidth utilization and the latency of the indexing service.

A grid-based index method for time warping distance

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently DTW (dynamic time warping) has been recognized as the most robust distance function to measure the similarity between two time series, and this fact has spawned a flurry of research on this topic. Most indexing methods proposed for DTW are based on the R-tree structure. Because of high dimensionality and loose lower bounds for time warping distance, the pruning power of these tree structures are quite weak, resulting in inefficient search. In this paper, we propose a dimensionality reduction method motivated by observations about the inherent character of each time series. A very compact index file is constructed. By scanning the index file, we can get a very small candidate set, so that the number of page access is dramatically reduced. We demonstrate the effectiveness of our approach on real and synthetic datasets.

DDR: an index method for large time-series datasets

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The tree index structure is a traditional method for searching similar data in large datasets. It is based on the presupposition that most sub-trees are pruned in the searching process. As a result, the number of page accesses is reduced. However, time-series datasets generally have a very high dimensionality. Because of the so-called dimensionality curse, the pruning effectiveness is reduced in high dimensionality. Consequently, the tree index structure is not a suitable method for time-series datasets. In this paper, we propose a two-phase (filtering and refinement) method for searching time-series datasets. In the filtering step, a quantizing time-series is used to construct a compact file which is scanned for filtering out irrelevant. A small set of candidates is translated to the second step for refinement. In this step, we introduce an effective index compression method named grid-based datawise dimensionality reduction (DRR) which attempts to preserve the characteristics of the time-series. An experimental comparison with existing techniques demonstrates the utility of our approach.

Pictorial practical tree and shrub culture : a practical manual giving directions for propagating, planting, and pruning trees and shrubs, and selections for various purposes /

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advertisements: [2] p., 1st group of paging; [2] p. at end; p. [2-3] of cover.

The American flower garden directory : containing practical directions for the culture of plants in the flower garden, hot-house, garden-house, rooms, or parlour windows, for every month in the year ... Instructions for erecting a hot-house, green-house, and laying out a flower garden. Also, table of soils most congenial to the plants contained in the work. The whole adapted to either large or small gardens, with instructions for preparing the soil, propagating, planting, pruning, training, and fruiting the grape vine. With descriptions of the best sorts for cultivating in the open air /

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Includes index.

Association between overweight or obesity and household income and parental body mass index in Australian youth - analysis of the Australian Nutrition Survey, 1995

Relevância:

20.00% 20.00%

Publicador:

Prediction of Magnetic Storm Events Using the Dst Index

Relevância:

20.00% 20.00%

Publicador:

Matching GP Terms to the ICD-10-AM Index

Relevância:

20.00% 20.00%

Publicador:

The Development and Validation of a Comorbidity Index for Prostate Cancer among Black Men

Relevância:

20.00% 20.00%

Publicador:

The Arch Index: A Measure of Flat or Fat Feet?

Relevância:

20.00% 20.00%

Publicador:

Sensitivity and Specificity of the Body Mass Index to Determine Obesity: A Study with Brazilian Men and Women

Relevância:

20.00% 20.00%

Publicador:

The Use of Body Mass Index to Predict Body Composition in Children

Relevância:

20.00% 20.00%

Publicador:

Prediction of Fractional Brownian Motion with Hurst Index Less than 1/2

Relevância:

20.00% 20.00%

Publicador:

«
1
2
3
4
5
6
7
8
...
66
67
»