153 resultados para Similarity measure

em Queensland University of Technology - ePrints Archive


Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we classify, review, and experimentally compare major methods that are exploited in the definition, adoption, and utilization of element similarity measures in the context of XML schema matching. We aim at presenting a unified view which is useful when developing a new element similarity measure, when implementing an XML schema matching component, when using an XML schema matching system, and when comparing XML schema matching systems.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors ($\approx$ 0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Collaborative tagging can help users organize, share and retrieve information in an easy and quick way. For the collaborative tagging information implies user’s important personal preference information, it can be used to recommend personalized items to users. This paper proposes a novel tag-based collaborative filtering approach for recommending personalized items to users of online communities that are equipped with tagging facilities. Based on the distinctive three dimensional relationships among users, tags and items, a new similarity measure method is proposed to generate the neighborhood of users with similar tagging behavior instead of similar implicit ratings. The promising experiment result shows that by using the tagging information the proposed approach outperforms the standard user and item based collaborative filtering approaches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recommender Systems is one of the effective tools to deal with information overload issue. Similar with the explicit rating and other implicit rating behaviours such as purchase behaviour, click streams, and browsing history etc., the tagging information implies user’s important personal interests and preferences information, which can be used to recommend personalized items to users. This paper is to explore how to utilize tagging information to do personalized recommendations. Based on the distinctive three dimensional relationships among users, tags and items, a new user profiling and similarity measure method is proposed. The experiments suggest that the proposed approach is better than the traditional collaborative filtering recommender systems using only rating data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a method of voice activity detection (VAD) suitable for high noise scenarios, based on the fusion of two complementary systems. The first system uses a proposed non-Gaussianity score (NGS) feature based on normal probability testing. The second system employs a histogram distance score (HDS) feature that detects changes in the signal through conducting a template-based similarity measure between adjacent frames. The decision outputs by the two systems are then merged using an open-by-reconstruction fusion stage. Accuracy of the proposed method was compared to several baseline VAD methods on a database created using real recordings of a variety of high-noise environments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the last few years we have observed a proliferation of approaches for clustering XML docu- ments and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the XML data to be clustered. These applications need data in the form of similar contents, tags, paths, structures and semantics. In this paper, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying on the abstract representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. This presentation leads to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering compo- nent. Finally, the paper moves into the description of future trends and research issues that still need to be faced.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Traditional area-based matching techniques make use of similarity metrics such as the Sum of Absolute Differences(SAD), Sum of Squared Differences (SSD) and Normalised Cross Correlation (NCC). Non-parametric matching algorithms such as the rank and census rely on the relative ordering of pixel values rather than the pixels themselves as a similarity measure. Both traditional area-based and non-parametric stereo matching techniques have an algorithmic structure which is amenable to fast hardware realisation. This investigation undertakes a performance assessment of these two families of algorithms for robustness to radiometric distortion and random noise. A generic implementation framework is presented for the stereo matching problem and the relative hardware requirements for the various metrics investigated.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Grouping users in social networks is an important process that improves matching and recommendation activities in social networks. The data mining methods of clustering can be used in grouping the users in social networks. However, the existing general purpose clustering algorithms perform poorly on the social network data due to the special nature of users' data in social networks. One main reason is the constraints that need to be considered in grouping users in social networks. Another reason is the need of capturing large amount of information about users which imposes computational complexity to an algorithm. In this paper, we propose a scalable and effective constraint-based clustering algorithm based on a global similarity measure that takes into consideration the users' constraints and their importance in social networks. Each constraint's importance is calculated based on the occurrence of this constraint in the dataset. Performance of the algorithm is demonstrated on a dataset obtained from an online dating website using internal and external evaluation measures. Results show that the proposed algorithm is able to increases the accuracy of matching users in social networks by 10% in comparison to other algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The continuous growth of the XML data poses a great concern in the area of XML data management. The need for processing large amounts of XML data brings complications to many applications, such as information retrieval, data integration and many others. One way of simplifying this problem is to break the massive amount of data into smaller groups by application of clustering techniques. However, XML clustering is an intricate task that may involve the processing of both the structure and the content of XML data in order to identify similar XML data. This research presents four clustering methods, two methods utilizing the structure of XML documents and the other two utilizing both the structure and the content. The two structural clustering methods have different data models. One is based on a path model and other is based on a tree model. These methods employ rigid similarity measures which aim to identifying corresponding elements between documents with different or similar underlying structure. The two clustering methods that utilize both the structural and content information vary in terms of how the structure and content similarity are combined. One clustering method calculates the document similarity by using a linear weighting combination strategy of structure and content similarities. The content similarity in this clustering method is based on a semantic kernel. The other method calculates the distance between documents by a non-linear combination of the structure and content of XML documents using a semantic kernel. Empirical analysis shows that the structure-only clustering method based on the tree model is more scalable than the structure-only clustering method based on the path model as the tree similarity measure for the tree model does not need to visit the parents of an element many times. Experimental results also show that the clustering methods perform better with the inclusion of the content information on most test document collections. To further the research, the structural clustering method based on tree model is extended and employed in XML transformation. The results from the experiments show that the proposed transformation process is faster than the traditional transformation system that translates and converts the source XML documents sequentially. Also, the schema matching process of XML transformation produces a better matching result in a shorter time.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Trees are capable of portraying the semi-structured data which is common in web domain. Finding similarities between trees is mandatory for several applications that deal with semi-structured data. Existing similarity methods examine a pair of trees by comparing through nodes and paths of two trees, and find the similarity between them. However, these methods provide unfavorable results for unordered tree data and result in yielding NP-hard or MAX-SNP hard complexity. In this paper, we present a novel method that encodes a tree with an optimal traversing approach first, and then, utilizes it to model the tree with its equivalent matrix representation for finding similarity between unordered trees efficiently. Empirical analysis shows that the proposed method is able to achieve high accuracy even on the large data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Accurate process model elicitation continues to be a time consuming task, requiring skill on the part of the interviewer to extract explicit and tacit process information from the interviewee. Many errors occur in this elicitation stage that would be avoided by better activity recall, more consistent specification methods and greater engagement in the elicitation process by interviewees. Theories of situated cognition indicate that interactive 3D representations of real work environments engage and prime the cognitive state of the viewer. In this paper, our major contribution is to augment a previous process elicitation methodology with virtual world context metadata, drawn from a 3D simulation of the workplace. We present a conceptual and formal approach for representing this contextual metadata, integrated into a process similarity measure that provides hints for the business analyst to use in later modelling steps. Finally, we conclude with examples from two use cases to illustrate the potential abilities of this approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Identifying product families has been considered as an effective way to accommodate the increasing product varieties across the diverse market niches. In this paper, we propose a novel framework to identifying product families by using a similarity measure for a common product design data BOM (Bill of Materials) based on data mining techniques such as frequent mining and clus-tering. For calculating the similarity between BOMs, a novel Extended Augmented Adjacency Matrix (EAAM) representation is introduced that consists of information not only of the content and topology but also of the fre-quent structural dependency among the various parts of a product design. These EAAM representations of BOMs are compared to calculate the similarity between products and used as a clustering input to group the product fami-lies. When applied on a real-life manufacturing data, the proposed framework outperforms a current baseline that uses orthogonal Procrustes for grouping product families.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Acoustic recordings of the environment provide an effective means to monitor bird species diversity. To facilitate exploration of acoustic recordings, we describe a content-based birdcall retrieval algorithm. A query birdcall is a region of spectrogram bounded by frequency and time. Retrieval depends on a similarity measure derived from the orientation and distribution of spectral ridges. The spectral ridge detection method caters for a broad range of birdcall structures. In this paper, we extend previous work by incorporating a spectrogram scaling step in order to improve the detection of spectral ridges. Compared to an existing approach based on MFCC features, our feature representation achieves better retrieval performance for multiple bird species in noisy recordings.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Bioacoustic data can be used for monitoring animal species diversity. The deployment of acoustic sensors enables acoustic monitoring at large temporal and spatial scales. We describe a content-based birdcall retrieval algorithm for the exploration of large data bases of acoustic recordings. In the algorithm, an event-based searching scheme and compact features are developed. In detail, ridge events are detected from audio files using event detection on spectral ridges. Then event alignment is used to search through audio files to locate candidate instances. A similarity measure is then applied to dimension-reduced spectral ridge feature vectors. The event-based searching method processes a smaller list of instances for faster retrieval. The experimental results demonstrate that our features achieve better success rate than existing methods and the feature dimension is greatly reduced.