973 resultados para similarity


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Karwath, A. King, R. Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics. 23rd April 2002. 3:11 Additional File Describes the title organims species declaration in one string [http://www.biomedcentral.com/content/supplementary/1471- 2105-3-11-S1.doc] Sponsorship: Andreas Karwath and Ross D. King were supported by the EPSRC grant GR/L62849.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

T.Boongoen and Q. Shen. Semi-Supervised OWA Aggregation for Link-Based Similarity Evaluation and Alias Detection. Proceedings of the 18th International Conference on Fuzzy Systems (FUZZ-IEEE'09), pp. 288-293, 2009. Sponsorship: EPSRC

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper introduces BoostMap, a method that can significantly reduce retrieval time in image and video database systems that employ computationally expensive distance measures, metric or non-metric. Database and query objects are embedded into a Euclidean space, in which similarities can be rapidly measured using a weighted Manhattan distance. Embedding construction is formulated as a machine learning task, where AdaBoost is used to combine many simple, 1D embeddings into a multidimensional embedding that preserves a significant amount of the proximity structure in the original space. Performance is evaluated in a hand pose estimation system, and a dynamic gesture recognition system, where the proposed method is used to retrieve approximate nearest neighbors under expensive image and video similarity measures. In both systems, BoostMap significantly increases efficiency, with minimal losses in accuracy. Moreover, the experiments indicate that BoostMap compares favorably with existing embedding methods that have been employed in computer vision and database applications, i.e., FastMap and Bourgain embeddings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently the notion of self-similarity has been shown to apply to wide-area and local-area network traffic. In this paper we examine the mechanisms that give rise to self-similar network traffic. We present an explanation for traffic self-similarity by using a particular subset of wide area traffic: traffic due to the World Wide Web (WWW). Using an extensive set of traces of actual user executions of NCSA Mosaic, reflecting over half a million requests for WWW documents, we show evidence that WWW traffic is self-similar. Then we show that the self-similarity in such traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in file transfer, the effect of user "think time", and the superimposition of many such transfers in a local area network. To do this we rely on empirically measured distributions both from our traces and from data independently collected at over thirty WWW sites.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Long-range dependence has been observed in many recent Internet traffic measurements. In addition, some recent studies have shown that under certain network conditions, TCP itself can produce traffic that exhibits dependence over limited timescales, even in the absence of higher-level variability. In this paper, we use a simple Markovian model to argue that when the loss rate is relatively high, TCP's adaptive congestion control mechanism indeed generates traffic with OFF periods exhibiting power-law shape over several timescales and thus introduces pseudo-long-range dependence into the overall traffic. Moreover, we observe that more variable initial retransmission timeout values for different packets introduces more variable packet inter-arrival times, which increases the burstiness of the overall traffic. We can thus explain why a single TCP connection can produce a time-series that can be misidentified as self-similar using standard tests.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Locating hands in sign language video is challenging due to a number of factors. Hand appearance varies widely across signers due to anthropometric variations and varying levels of signer proficiency. Video can be captured under varying illumination, camera resolutions, and levels of scene clutter, e.g., high-res video captured in a studio vs. low-res video gathered by a web cam in a user’s home. Moreover, the signers’ clothing varies, e.g., skin-toned clothing vs. contrasting clothing, short-sleeved vs. long-sleeved shirts, etc. In this work, the hand detection problem is addressed in an appearance matching framework. The Histogram of Oriented Gradient (HOG) based matching score function is reformulated to allow non-rigid alignment between pairs of images to account for hand shape variation. The resulting alignment score is used within a Support Vector Machine hand/not-hand classifier for hand detection. The new matching score function yields improved performance (in ROC area and hand detection rate) over the Vocabulary Guided Pyramid Match Kernel (VGPMK) and the traditional, rigid HOG distance on American Sign Language video gestured by expert signers. The proposed match score function is computationally less expensive (for training and testing), has fewer parameters and is less sensitive to parameter settings than VGPMK. The proposed detector works well on test sequences from an inexpert signer in a non-studio setting with cluttered background.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nearest neighbor classification using shape context can yield highly accurate results in a number of recognition problems. Unfortunately, the approach can be too slow for practical applications, and thus approximation strategies are needed to make shape context practical. This paper proposes a method for efficient and accurate nearest neighbor classification in non-Euclidean spaces, such as the space induced by the shape context measure. First, a method is introduced for constructing a Euclidean embedding that is optimized for nearest neighbor classification accuracy. Using that embedding, multiple approximations of the underlying non-Euclidean similarity measure are obtained, at different levels of accuracy and efficiency. The approximations are automatically combined to form a cascade classifier, which applies the slower approximations only to the hardest cases. Unlike typical cascade-of-classifiers approaches, that are applied to binary classification problems, our method constructs a cascade for a multiclass problem. Experiments with a standard shape data set indicate that a two-to-three order of magnitude speed up is gained over the standard shape context classifier, with minimal losses in classification accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Meta-analyses of genome-wide association studies (GWAS) have demonstrated that the same genetic variants can be associated with multiple diseases and other complex traits. We present software called CPAG (Cross-Phenotype Analysis of GWAS) to look for similarities between 700 traits, build trees with informative clusters, and highlight underlying pathways. Clusters are consistent with pre-defined groups and literature-based validation but also reveal novel connections. We report similarity between plasma palmitoleic acid and Crohn's disease and find that specific fatty acids exacerbate enterocolitis in zebrafish. CPAG will become increasingly powerful as more genetic variants are uncovered, leading to a deeper understanding of complex traits. CPAG is freely available at www.sourceforge.net/projects/CPAG/.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite knowing a familiar individual (such as a daughter) well, anecdotal evidence suggests that naming errors can occur among very familiar individuals. Here, we investigate the conditions surrounding these types of errors, or misnamings, in which a person (the misnamer) incorrectly calls a familiar individual (the misnamed) by someone else's name (the named). Across 5 studies including over 1,700 participants, we investigated the prevalence of the phenomenon of misnaming, identified factors underlying why it may occur, and tested potential mechanisms. We included undergraduates and MTurk workers and asked questions of both the misnamed and the misnamer. We find that familiar individuals are often misnamed with the name of another member of the same semantic category; family members are misnamed with another family member's name and friends are misnamed with another friend's name. Phonetic similarity between names also leads to misnamings; however, the size of this effect was smaller than that of the semantic category effect. Overall, the misnaming of familiar individuals is driven by the relationship between the misnamer, misnamed, and named; phonetic similarity between the incorrect name used by the misnamer and the correct name also plays a role in misnaming.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper examines different ways for measuring similarity between software design models for the purpose of software reuse. Current approaches to this problem are discussed and a set of suitable similarity metrics are proposed and evaluated. Work on the optimisation of weights to increase the competence of a CBR system is presented. A graph matching algorithm and associated metrics capturing the structural similarity between UML class diagrams is presented and demonstrated through an example case.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we shall critically examine a special class of graph matching algorithms that follow the approach of node-similarity measurement. A high-level algorithm framework, namely node-similarity graph matching framework (NSGM framework), is proposed, from which, many existing graph matching algorithms can be subsumed, including the eigen-decomposition method of Umeyama, the polynomial-transformation method of Almohamad, the hubs and authorities method of Kleinberg, and the kronecker product successive projection methods of Wyk, etc. In addition, improved algorithms can be developed from the NSGM framework with respects to the corresponding results in graph theory. As the observation, it is pointed out that, in general, any algorithm which can be subsumed from NSGM framework fails to work well for graphs with non-trivial auto-isomorphism structure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper examines different ways of measuring similarity between software design models for Case Based Reasoning (CBR) to facilitate reuse of software design and code. The paper considers structural and behavioural aspects of similarity between software design models. Similarity metrics for comparing static class structures are defined and discussed. A Graph representation of UML class diagrams and corresponding similarity measures for UML class diagrams are defined. A full search graph matching algorithm for measuring structural similarity diagrams based on the identification of the Maximum Common Sub-graph (MCS) is presented. Finally, a simple evaluation of the approach is presented and discussed.