839 resultados para Hamming distance
Resumo:
The count-min sketch is a useful data structure for recording and estimating the frequency of string occurrences, such as passwords, in sub-linear space with high accuracy. However, it cannot be used to draw conclusions on groups of strings that are similar, for example close in Hamming distance. This paper introduces a variant of the count-min sketch which allows for estimating counts within a specified Hamming distance of the queried string. This variant can be used to prevent users from choosing popular passwords, like the original sketch, but it also allows for a more efficient method of analysing password statistics.
Resumo:
[EN]In this paper we deal with distributions over permutation spaces. The Mallows model is the mode l in use. The associated distance for permutations is the Hamming distance.
Resumo:
This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability. Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.
Resumo:
In this paper we present an original approach for finding approximate nearest neighbours in collections of locality-sensitive hashes. The paper demonstrates that this approach makes high-performance nearest-neighbour searching feasible on Web-scale collections and commodity hardware with minimal degradation in search quality.
Resumo:
This thesis studies document signatures, which are small representations of documents and other objects that can be stored compactly and compared for similarity. This research finds that document signatures can be effectively and efficiently used to both search and understand relationships between documents in large collections, scalable enough to search a billion documents in a fraction of a second. Deliverables arising from the research include an investigation of the representational capacity of document signatures, the publication of an open-source signature search platform and an approach for scaling signature retrieval to operate efficiently on collections containing hundreds of millions of documents.
Resumo:
It is well known that n-length stabilizer quantum error correcting codes (QECCs) can be obtained via n-length classical error correction codes (CECCs) over GF(4), that are additive and self-orthogonal with respect to the trace Hermitian inner product. But, most of the CECCs have been studied with respect to the Euclidean inner product. In this paper, it is shown that n-length stabilizer QECCs can be constructed via 371 length linear CECCs over GF(2) that are self-orthogonal with respect to the Euclidean inner product. This facilitates usage of the widely studied self-orthogonal CECCs to construct stabilizer QECCs. Moreover, classical, binary, self-orthogonal cyclic codes have been used to obtain stabilizer QECCs with guaranteed quantum error correcting capability. This is facilitated by the fact that (i) self-orthogonal, binary cyclic codes are easily identified using transform approach and (ii) for such codes lower bounds on the minimum Hamming distance are known. Several explicit codes are constructed including two pure MDS QECCs.
Resumo:
An associative memory with parallel architecture is presented. The neurons are modelled by perceptrons having only binary, rather than continuous valued input. To store m elements each having n features, m neurons each with n connections are needed. The n features are coded as an n-bit binary vector. The weights of the n connections that store the n features of an element has only two values -1 and 1 corresponding to the absence or presence of a feature. This makes the learning very simple and straightforward. For an input corrupted by binary noise, the associative memory indicates the element that is closest (in terms of Hamming distance) to the noisy input. In the case where the noisy input is equidistant from two or more stored vectors, the associative memory indicates two or more elements simultaneously. From some simple experiments performed on the human memory and also on the associative memory, it can be concluded that the associative memory presented in this paper is in some respect more akin to a human memory than a Hopfield model.
Resumo:
We address the problem of designing codes for specific applications using deterministic annealing. Designing a block code over any finite dimensional space may be thought of as forming the corresponding number of clusters over the particular dimensional space. We have shown that the total distortion incurred in encoding a training set is related to the probability of correct reception over a symmetric channel. While conventional deterministic annealing make use of the Euclidean squared error distance measure, we have developed an algorithm that can be used for clustering with Hamming distance as the distance measure, which is required in the error correcting, scenario.
Resumo:
本文提出一种双向联想记忆神经网络的按‘位’加权编码策略,并给出了求取权值的速推算法.它将Kosko双向联想记忆神经网络按海明距离进行模式匹配的原则,修正为按加权海明距离进行模式匹配,从而可以使得对不满足连续性的所谓“病态结构”的一类样本模式集,同样具有良好的联想能力.对二值图象模式存贮、联想的计算机模拟实验表明,此方法具有优良的性能和实用价值。
Resumo:
Object detection and recognition are important problems in computer vision. The challenges of these problems come from the presence of noise, background clutter, large within class variations of the object class and limited training data. In addition, the computational complexity in the recognition process is also a concern in practice. In this thesis, we propose one approach to handle the problem of detecting an object class that exhibits large within-class variations, and a second approach to speed up the classification processes. In the first approach, we show that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly solved with using a multiplicative form of two kernel functions. One kernel measures similarity for foreground-background classification. The other kernel accounts for latent factors that control within-class variation and implicitly enables feature sharing among foreground training samples. For applications where explicit parameterization of the within-class states is unavailable, a nonparametric formulation of the kernel can be constructed with a proper foreground distance/similarity measure. Detector training is accomplished via standard Support Vector Machine learning. The resulting detectors are tuned to specific variations in the foreground class. They also serve to evaluate hypotheses of the foreground state. When the image masks for foreground objects are provided in training, the detectors can also produce object segmentation. Methods for generating a representative sample set of detectors are proposed that can enable efficient detection and tracking. In addition, because individual detectors verify hypotheses of foreground state, they can also be incorporated in a tracking-by-detection frame work to recover foreground state in image sequences. To run the detectors efficiently at the online stage, an input-sensitive speedup strategy is proposed to select the most relevant detectors quickly. The proposed approach is tested on data sets of human hands, vehicles and human faces. On all data sets, the proposed approach achieves improved detection accuracy over the best competing approaches. In the second part of the thesis, we formulate a filter-and-refine scheme to speed up recognition processes. The binary outputs of the weak classifiers in a boosted detector are used to identify a small number of candidate foreground state hypotheses quickly via Hamming distance or weighted Hamming distance. The approach is evaluated in three applications: face recognition on the face recognition grand challenge version 2 data set, hand shape detection and parameter estimation on a hand data set, and vehicle detection and estimation of the view angle on a multi-pose vehicle data set. On all data sets, our approach is at least five times faster than simply evaluating all foreground state hypotheses with virtually no loss in classification accuracy.