943 resultados para speaker clustering


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present a document clustering framework incorporating instance-level knowledge in the form of pairwise constraints and attribute-level knowledge in the form of keyphrases. Firstly, we initialize weights based on metric learning with pairwise constraints, then simultaneously learn two kinds of knowledge by combining the distance-based and the constraint-based approaches, finally evaluate and select clustering result based on the degree of users’ satisfaction. The experimental results demonstrate the effectiveness and potential of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous papers have illustrated that clustering can achieve high accuracy and discover unknown application classes. In this work, we present a novel semi-supervised learning method using constrained clustering algorithms. The motivation is that in network domain a lot of background information is available in addition to the data instances themselves. For example, we might know that flow ƒ1 and ƒ2 are using the same application protocol because they are visiting the same host address at the same port simultaneously. In this case, ƒ1 and ƒ2 shall be grouped into the same cluster ideally. Therefore, we describe these correlations in the form of pair-wise must-link constraints and incorporate them in the process of clustering. We have applied three constrained variants of the K-Means algorithm, which perform hard or soft constraint satisfaction and metric learning from constraints. A number of real-world traffic traces have been used to show the availability of constraints and to test the proposed approach. The experimental results indicate that by incorporating constraints in the course of clustering, the overall accuracy and cluster purity can be significantly improved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper examines the recovery of user context in indoor environmnents with existing wireless infrastructures to enable assistive systems. We present a novel approach to the extraction of user context, casting the problem of context recovery as an unsupervised, clustering problem. A well known density-based clustering technique, DBSCAN, is adapted to recover user context that includes user motion state, and significant places the user visits from WiFi observations consisting of access point id and signal strength. Furthermore, user rhythms or sequences of places the user visits periodically are derived from the above low level contexts by employing state-of-the-art probabilistic clustering technique, the Latent Dirichiet Allocation (LDA), to enable a variety of application services. Experimental results with real data are presented to validate the proposed unsupervised learning approach and demonstrate its applicability.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article presents experimental results devoted to a new application of the novel clustering technique introduced by the authors recently. Our aim is to facilitate the application of robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on the particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, we use a consensus function to combine these independent clusterings into one consensus clustering . Feature ranking is used to select a subset of features for the consensus function. Third, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of three consensus functions, Cluster-Based Graph Formulation (CBGF), Hybrid Bipartite Graph Formulation (HBGF), and Instance-Based Graph Formulation (IBGF) and a variety of supervised classification algorithms. The best precision and recall have been obtained by the combination of the HBGF consensus function and the SMO classifier with the polynomial kernel.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman--Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for the effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, an empirical analysis to examine the effects of image segmentation with different colour models using the fuzzy c-means (FCM) clustering algorithm is conducted. A qualitative evaluation method based on human perceptual judgement is used. Two sets of complex images, i.e., outdoor scenes and satellite imagery, are used for demonstration. These images are employed to examine the characteristics of image segmentation using FCM with eight different colour models. The results obtained from the experimental study are compared and analysed. It is found that the CIELAB colour model yields the best outcomes in colour image segmentation with FCM.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Both the instance level knowledge and the attribute level knowledge can improve clustering quality, but how to effectively utilize both of them is an essential problem to solve. This paper proposes a wrapper framework for semi-supervised clustering, which aims to gracely integrate both kinds of priori knowledge in the clustering process, the instance level knowledge in the form of pairwise constraints and the attribute level knowledge in the form of attribute order preferences. The wrapped algorithm is then designed as a semi-supervised clustering process which transforms this clustering problem into an optimization problem. The experimental results demonstrate the effectiveness and potential of proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a new image segmentation approach that integrates color and texture features using the fuzzy c-means clustering algorithm is described. To demonstrate the applicability of the proposed approach to satellite image retrieval, an interactive region-based image query system is designed and developed. A database comprising 400 multispectral satellite images is used to evaluate the performance of the system. The results are analyzed and discussed, and a performance comparison with other methods is included. The outcomes reveal that the proposed approach is able to improve the quality of the segmentation results as well as the retrieval performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speaker recognition is the process of automatically recognizing the speaker by analyzing individual information contained in the speech waves. In this paper, we discuss the development of an intelligent system for text-dependent speaker recognition. The system comprises two main modules, a wavelet-based signal-processing module for feature extraction of speech waves, and an artificial-neural-network-based classifier module to identify and categorize the speakers. Wavelet is used in de-noising and in compressing the speech signals. The wavelet family that we used is the Daubechies Wavelets. After extracting the necessary features from the speech waves, the features were then fed to a neural-network-based classifier to identify the speakers. We have implemented the Fuzzy ARTMAP (FAM) network in the classifier module to categorize the de-noised and compressed signals. The proposed intelligent learning system has been applied to a case study of text-dependent speaker recognition problem.