812 resultados para Capacitated clustering
Resumo:
We propose a new approach to clustering. Our idea is to map cluster formation to coalition formation in cooperative games, and to use the Shapley value of the patterns to identify clusters and cluster representatives. We show that the underlying game is convex and this leads to an efficient biobjective clustering algorithm that we call BiGC. The algorithm yields high-quality clustering with respect to average point-to-center distance (potential) as well as average intracluster point-to-point distance (scatter). We demonstrate the superiority of BiGC over state-of-the-art clustering algorithms (including the center based and the multiobjective techniques) through a detailed experimentation using standard cluster validity criteria on several benchmark data sets. We also show that BiGC satisfies key clustering properties such as order independence, scale invariance, and richness.
Resumo:
Clustering techniques which can handle incomplete data have become increasingly important due to varied applications in marketing research, medical diagnosis and survey data analysis. Existing techniques cope up with missing values either by using data modification/imputation or by partial distance computation, often unreliable depending on the number of features available. In this paper, we propose a novel approach for clustering data with missing values, which performs the task by Symmetric Non-Negative Matrix Factorization (SNMF) of a complete pair-wise similarity matrix, computed from the given incomplete data. To accomplish this, we define a novel similarity measure based on Average Overlap similarity metric which can effectively handle missing values without modification of data. Further, the similarity measure is more reliable than partial distances and inherently possesses the properties required to perform SNMF. The experimental evaluation on real world datasets demonstrates that the proposed approach is efficient, scalable and shows significantly better performance compared to the existing techniques.
Resumo:
Motivated by multi-distribution divergences, which originate in information theory, we propose a notion of `multipoint' kernels, and study their applications. We study a class of kernels based on Jensen type divergences and show that these can be extended to measure similarity among multiple points. We study tensor flattening methods and develop a multi-point (kernel) spectral clustering (MSC) method. We further emphasize on a special case of the proposed kernels, which is a multi-point extension of the linear (dot-product) kernel and show the existence of cubic time tensor flattening algorithm in this case. Finally, we illustrate the usefulness of our contributions using standard data sets and image segmentation tasks.
Resumo:
Homogeneous temperature regions are necessary for use in hydrometeorological studies. The regions are often delineated by analysing statistics derived from time series of maximum, minimum or mean temperature, rather than attributes influencing temperature. This practice cannot yield meaningful regions in data-sparse areas. Further, independent validation of the delineated regions for homogeneity in temperature is not possible, as temperature records form the basis to arrive at the regions. To address these issues, a two-stage clustering approach is proposed in this study to delineate homogeneous temperature regions. First stage of the approach involves (1) determining correlation structure between observed temperature over the study area and possible predictors (large-scale atmospheric variables) influencing the temperature and (2) using the correlation structure as the basis to delineate sites in the study area into clusters. Second stage of the approach involves analysis on each of the clusters to (1) identify potential predictors (large-scale atmospheric variables) influencing temperature at sites in the cluster and (2) partition the cluster into homogeneous fuzzy temperature regions using the identified potential predictors. Application of the proposed approach to India yielded 28 homogeneous regions that were demonstrated to be effective when compared to an alternate set of 6 regions that were previously delineated over the study area. Intersite cross-correlations of monthly maximum and minimum temperatures in the existing regions were found to be weak and negative for several months, which is undesirable. This problem was not found in the case of regions delineated using the proposed approach. Utility of the proposed regions in arriving at estimates of potential evapotranspiration for ungauged locations in the study area is demonstrated.
Resumo:
Resumen: Este artículo analiza la relación entre la agrupación espacial de la distribución del ingreso y la desigualdad en las provincias de Argentina. El objetivo de este trabajo es usar técnicas espaciales para analizar hasta que punto la agrupación espacial de la distribución del ingreso afecta la desigualdad de la distribución del ingreso en un contexto regional de Argentina. En general, la literatura de desigualdad implícitamente considera a cada región o provincia como una entidad independiente y el potencial para la observación de la interacción a través del espacio a menudo se ha ignorado. Mientras tanto, la autocorrelación espacial ocurre cuando la distribución espacial de la variable de interés exhibe un patrón sistemático. Yo computo tres medidas de autocorrelación espacial global: La I de Moran, c de Geary, y G de Getis y Ord, como grado de CLUSTERING provincial entre 1991 y 2002. La principal conclusión del trabajo es que hay evidencia que provincias con desigualdad relativamente alta (baja) tienden a ser localizadas cerca de otras provincias con alta (baja) desigualdad más a menudo de lo esperado debido al azar. Por ende cada provincia no debería ser vista como una observación independiente, como ha sido supuesto implícitamente en estudios previos sobre la desigualdad de ingresos regional.