65 resultados para speaker clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a new approach to clustering. Our idea is to map cluster formation to coalition formation in cooperative games, and to use the Shapley value of the patterns to identify clusters and cluster representatives. We show that the underlying game is convex and this leads to an efficient biobjective clustering algorithm that we call BiGC. The algorithm yields high-quality clustering with respect to average point-to-center distance (potential) as well as average intracluster point-to-point distance (scatter). We demonstrate the superiority of BiGC over state-of-the-art clustering algorithms (including the center based and the multiobjective techniques) through a detailed experimentation using standard cluster validity criteria on several benchmark data sets. We also show that BiGC satisfies key clustering properties such as order independence, scale invariance, and richness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering techniques which can handle incomplete data have become increasingly important due to varied applications in marketing research, medical diagnosis and survey data analysis. Existing techniques cope up with missing values either by using data modification/imputation or by partial distance computation, often unreliable depending on the number of features available. In this paper, we propose a novel approach for clustering data with missing values, which performs the task by Symmetric Non-Negative Matrix Factorization (SNMF) of a complete pair-wise similarity matrix, computed from the given incomplete data. To accomplish this, we define a novel similarity measure based on Average Overlap similarity metric which can effectively handle missing values without modification of data. Further, the similarity measure is more reliable than partial distances and inherently possesses the properties required to perform SNMF. The experimental evaluation on real world datasets demonstrates that the proposed approach is efficient, scalable and shows significantly better performance compared to the existing techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Motivated by multi-distribution divergences, which originate in information theory, we propose a notion of `multipoint' kernels, and study their applications. We study a class of kernels based on Jensen type divergences and show that these can be extended to measure similarity among multiple points. We study tensor flattening methods and develop a multi-point (kernel) spectral clustering (MSC) method. We further emphasize on a special case of the proposed kernels, which is a multi-point extension of the linear (dot-product) kernel and show the existence of cubic time tensor flattening algorithm in this case. Finally, we illustrate the usefulness of our contributions using standard data sets and image segmentation tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose apractical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks. (C) 2015 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Homogeneous temperature regions are necessary for use in hydrometeorological studies. The regions are often delineated by analysing statistics derived from time series of maximum, minimum or mean temperature, rather than attributes influencing temperature. This practice cannot yield meaningful regions in data-sparse areas. Further, independent validation of the delineated regions for homogeneity in temperature is not possible, as temperature records form the basis to arrive at the regions. To address these issues, a two-stage clustering approach is proposed in this study to delineate homogeneous temperature regions. First stage of the approach involves (1) determining correlation structure between observed temperature over the study area and possible predictors (large-scale atmospheric variables) influencing the temperature and (2) using the correlation structure as the basis to delineate sites in the study area into clusters. Second stage of the approach involves analysis on each of the clusters to (1) identify potential predictors (large-scale atmospheric variables) influencing temperature at sites in the cluster and (2) partition the cluster into homogeneous fuzzy temperature regions using the identified potential predictors. Application of the proposed approach to India yielded 28 homogeneous regions that were demonstrated to be effective when compared to an alternate set of 6 regions that were previously delineated over the study area. Intersite cross-correlations of monthly maximum and minimum temperatures in the existing regions were found to be weak and negative for several months, which is undesirable. This problem was not found in the case of regions delineated using the proposed approach. Utility of the proposed regions in arriving at estimates of potential evapotranspiration for ungauged locations in the study area is demonstrated.