4 resultados para Points distribution in high dimensional space

em Massachusetts Institute of Technology


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propose a structure (BID) using BIt-Difference to answer approximate KNN query. The BID employs one bit to represent each feature vector of point and the number of bit-difference is used to prune the further points. To facilitate real dataset which is typically skewed, we enhance the BID mechanism with clustering, cluster adapted bitcoder and dimensional weight, named the BID⁺. Extensive experiments are conducted to show that our proposed method yields significant performance advantages over the existing index structures on both real life and synthetic high-dimensional datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Chow and Liu introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimesion of the domain, and linear in the number of data points that define the target distribution $P$. This paper shows that for sparse, discrete data, fitting a tree distribution can be done in time and memory that is jointly subquadratic in the number of variables and the size of the data set. The new algorithm, called the acCL algorithm, takes advantage of the sparsity of the data to accelerate the computation of pairwise marginals and the sorting of the resulting mutual informations, achieving speed ups of up to 2-3 orders of magnitude in the experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts. In its simplest form, relevance is defined to be the dot product between a document and a query vector--a measure of the number of common terms. A central difficulty in text retrieval is that the presence or absence of a word is not sufficient to determine relevance to a query. Linear dimensionality reduction has been proposed as a technique for extracting underlying structure from the document collection. In some domains (such as vision) dimensionality reduction reduces computational complexity. In text retrieval it is more often used to improve retrieval performance. We propose an alternative and novel technique that produces sparse representations constructed from sets of highly-related words. Documents and queries are represented by their distance to these sets. and relevance is measured by the number of common clusters. This technique significantly improves retrieval performance, is efficient to compute and shares properties with the optimal linear projection operator and the independent components of documents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Methods are presented (1) to partition or decompose a visual scene into the bodies forming it; (2) to position these bodies in three-dimensional space, by combining two scenes that make a stereoscopic pair; (3) to find the regions or zones of a visual scene that belong to its background; (4) to carry out the isolation of objects in (1) when the input has inaccuracies. Running computer programs implement the methods, and many examples illustrate their behavior. The input is a two-dimensional line-drawing of the scene, assumed to contain three-dimensional bodies possessing flat faces (polyhedra); some of them may be partially occluded. Suggestions are made for extending the work to curved objects. Some comparisons are made with human visual perception. The main conclusion is that it is possible to separate a picture or scene into the constituent objects exclusively on the basis of monocular geometric properties (on the basis of pure form); in fact, successful methods are shown.