2 resultados para data complexity
em Massachusetts Institute of Technology
Resumo:
Many current recognition systems use constrained search to locate objects in cluttered environments. Previous formal analysis has shown that the expected amount of search is quadratic in the number of model and data features, if all the data is known to come from a sinlge object, but is exponential when spurious data is included. If one can group the data into subsets likely to have come from a single object, then terminating the search once a "good enough" interpretation is found reduces the expected search to cubic. Without successful grouping, terminated search is still exponential. These results apply to finding instances of a known object in the data. In this paper, we turn to the problem of selecting models from a library, and examine the combinatorics of determining that a candidate object is not present in the data. We show that the expected search is again exponential, implying that naﶥ approaches to indexing are likely to carry an expensive overhead, since an exponential amount of work is needed to week out each of the incorrect models. The analytic results are shown to be in agreement with empirical data for cluttered object recognition.
Resumo:
The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts. In its simplest form, relevance is defined to be the dot product between a document and a query vector--a measure of the number of common terms. A central difficulty in text retrieval is that the presence or absence of a word is not sufficient to determine relevance to a query. Linear dimensionality reduction has been proposed as a technique for extracting underlying structure from the document collection. In some domains (such as vision) dimensionality reduction reduces computational complexity. In text retrieval it is more often used to improve retrieval performance. We propose an alternative and novel technique that produces sparse representations constructed from sets of highly-related words. Documents and queries are represented by their distance to these sets. and relevance is measured by the number of common clusters. This technique significantly improves retrieval performance, is efficient to compute and shares properties with the optimal linear projection operator and the independent components of documents.