74 resultados para Incremental Clustering
Resumo:
Clustering techniques are used in regional flood frequency analysis (RFFA) to partition watersheds into natural groups or regions with similar hydrologic responses. The linear Kohonen's self‐organizing feature map (SOFM) has been applied as a clustering technique for RFFA in several recent studies. However, it is seldom possible to interpret clusters from the output of an SOFM, irrespective of its size and dimensionality. In this study, we demonstrate that SOFMs may, however, serve as a useful precursor to clustering algorithms. We present a two‐level. SOFM‐based clustering approach to form regions for FFA. In the first level, the SOFM is used to form a two‐dimensional feature map. In the second level, the output nodes of SOFM are clustered using Fuzzy c‐means algorithm to form regions. The optimal number of regions is based on fuzzy cluster validation measures. Effectiveness of the proposed approach in forming homogeneous regions for FFA is illustrated through application to data from watersheds in Indiana, USA. Results show that the performance of the proposed approach to form regions is better than that based on classical SOFM.
Resumo:
This paper presents a novel Second Order Cone Programming (SOCP) formulation for large scale binary classification tasks. Assuming that the class conditional densities are mixture distributions, where each component of the mixture has a spherical covariance, the second order statistics of the components can be estimated efficiently using clustering algorithms like BIRCH. For each cluster, the second order moments are used to derive a second order cone constraint via a Chebyshev-Cantelli inequality. This constraint ensures that any data point in the cluster is classified correctly with a high probability. This leads to a large margin SOCP formulation whose size depends on the number of clusters rather than the number of training data points. Hence, the proposed formulation scales well for large datasets when compared to the state-of-the-art classifiers, Support Vector Machines (SVMs). Experiments on real world and synthetic datasets show that the proposed algorithm outperforms SVM solvers in terms of training time and achieves similar accuracies.
Resumo:
We present two online algorithms for maintaining a topological order of a directed n-vertex acyclic graph as arcs are added, and detecting a cycle when one is created. Our first algorithm handles m arc additions in O(m(3/2)) time. For sparse graphs (m/n = O(1)), this bound improves the best previous bound by a logarithmic factor, and is tight to within a constant factor among algorithms satisfying a natural locality property. Our second algorithm handles an arbitrary sequence of arc additions in O(n(5/2)) time. For sufficiently dense graphs, this bound improves the best previous bound by a polynomial factor. Our bound may be far from tight: we show that the algorithm can take Omega(n(2)2 root(2lgn)) time by relating its performance to a generalization of the k-levels problem of combinatorial geometry. A completely different algorithm running in Theta (n(2) log n) time was given recently by Bender, Fineman, and Gilbert. We extend both of our algorithms to the maintenance of strong components, without affecting the asymptotic time bounds.
Resumo:
This paper presents hierarchical clustering algorithms for land cover mapping problem using multi-spectral satellite images. In unsupervised techniques, the automatic generation of number of clusters and its centers for a huge database is not exploited to their full potential. Hence, a hierarchical clustering algorithm that uses splitting and merging techniques is proposed. Initially, the splitting method is used to search for the best possible number of clusters and its centers using Mean Shift Clustering (MSC), Niche Particle Swarm Optimization (NPSO) and Glowworm Swarm Optimization (GSO). Using these clusters and its centers, the merging method is used to group the data points based on a parametric method (k-means algorithm). A performance comparison of the proposed hierarchical clustering algorithms (MSC, NPSO and GSO) is presented using two typical multi-spectral satellite images - Landsat 7 thematic mapper and QuickBird. From the results obtained, we conclude that the proposed GSO based hierarchical clustering algorithm is more accurate and robust.
Resumo:
Lack of supervision in clustering algorithms often leads to clusters that are not useful or interesting to human reviewers. We investigate if supervision can be automatically transferred for clustering a target task, by providing a relevant supervised partitioning of a dataset from a different source task. The target clustering is made more meaningful for the human user by trading-off intrinsic clustering goodness on the target task for alignment with relevant supervised partitions in the source task, wherever possible. We propose a cross-guided clustering algorithm that builds on traditional k-means by aligning the target clusters with source partitions. The alignment process makes use of a cross-task similarity measure that discovers hidden relationships across tasks. When the source and target tasks correspond to different domains with potentially different vocabularies, we propose a projection approach using pivot vocabularies for the cross-domain similarity measure. Using multiple real-world and synthetic datasets, we show that our approach improves clustering accuracy significantly over traditional k-means and state-of-the-art semi-supervised clustering baselines, over a wide range of data characteristics and parameter settings.
Resumo:
This paper presents an experimental study on damage assessment of reinforced concrete (RC) beams subjected to incremental cyclic loading. During testing acoustic emissions (AEs) were recorded. The analysis of the AE released was carried out by using parameters relaxation ratio, load ratio and calm ratio. Digital image correlation (DIC) technique and tracking with available MATLAB program were used to measure the displacement and surface strains in concrete. Earlier researchers classified the damage in RC beams using Kaiser effect, crack mouth opening displacement and proposed a standard. In general (or in practical situations), multiple cracks occur in reinforced concrete beams. In the present study damage assessment in RC beams was studied according to different limit states specified by the code of practice IS-456:2000 and AE technique. Based on the two ratios namely load ratio and calm ratio and when the deflection reached approximately 85% of the maximum allowable deflection it was observed that the RC beams were heavily damaged. The combination of AE and DIC techniques has the potential to provide the state of damage in RC structures.
Resumo:
In this paper, we develop a game theoretic approach for clustering features in a learning problem. Feature clustering can serve as an important preprocessing step in many problems such as feature selection, dimensionality reduction, etc. In this approach, we view features as rational players of a coalitional game where they form coalitions (or clusters) among themselves in order to maximize their individual payoffs. We show how Nash Stable Partition (NSP), a well known concept in the coalitional game theory, provides a natural way of clustering features. Through this approach, one can obtain some desirable properties of the clusters by choosing appropriate payoff functions. For a small number of features, the NSP based clustering can be found by solving an integer linear program (ILP). However, for large number of features, the ILP based approach does not scale well and hence we propose a hierarchical approach. Interestingly, a key result that we prove on the equivalence between a k-size NSP of a coalitional game and minimum k-cut of an appropriately constructed graph comes in handy for large scale problems. In this paper, we use feature selection problem (in a classification setting) as a running example to illustrate our approach. We conduct experiments to illustrate the efficacy of our approach.
Resumo:
In this paper, we approach the classical problem of clustering using solution concepts from cooperative game theory such as Nucleolus and Shapley value. We formulate the problem of clustering as a characteristic form game and develop a novel algorithm DRAC (Density-Restricted Agglomerative Clustering) for clustering. With extensive experimentation on standard data sets, we compare the performance of DRAC with that of well known algorithms. We show an interesting result that four prominent solution concepts, Nucleolus, Shapley value, Gately point and \tau-value coincide for the defined characteristic form game. This vindicates the choice of the characteristic function of the clustering game and also provides strong intuitive foundation for our approach.
Resumo:
This paper presents an improved hierarchical clustering algorithm for land cover mapping problem using quasi-random distribution. Initially, Niche Particle Swarm Optimization (NPSO) with pseudo/quasi-random distribution is used for splitting the data into number of cluster centers by satisfying Bayesian Information Criteria (BIC). Themain objective is to search and locate the best possible number of cluster and its centers. NPSO which highly depends on the initial distribution of particles in search space is not been exploited to its full potential. In this study, we have compared more uniformly distributed quasi-random with pseudo-random distribution with NPSO for splitting data set. Here to generate quasi-random distribution, Faure method has been used. Performance of previously proposed methods namely K-means, Mean Shift Clustering (MSC) and NPSO with pseudo-random is compared with the proposed approach - NPSO with quasi distribution(Faure). These algorithms are used on synthetic data set and multi-spectral satellite image (Landsat 7 thematic mapper). From the result obtained we conclude that use of quasi-random sequence with NPSO for hierarchical clustering algorithm results in a more accurate data classification.
Resumo:
In this paper, a comparative study is carried using three nature-inspired algorithms namely Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Cuckoo Search (CS) on clustering problem. Cuckoo search is used with levy flight. The heavy-tail property of levy flight is exploited here. These algorithms are used on three standard benchmark datasets and one real-time multi-spectral satellite dataset. The results are tabulated and analysed using various techniques. Finally we conclude that under the given set of parameters, cuckoo search works efficiently for majority of the dataset and levy flight plays an important role.
Resumo:
This paper illustrates the application of a new technique, based on Support Vector Clustering (SVC) for the direct identification of coherent synchronous generators in a large interconnected Multi-Machine Power Systems. The clustering is based on coherency measures, obtained from the time domain responses of the generators following system disturbances. The proposed clustering algorithm could be integrated into a wide-area measurement system that enables fast identification of coherent clusters of generators for the construction of dynamic equivalent models. An application of the proposed method is demonstrated on a practical 15 generators 72-bus system, an equivalent of Indian Southern grid in an attempt to show the effectiveness of this clustering approach. The effects of short circuit fault locations on coherency are also investigated.
Resumo:
We address the problem of detecting cells in biological images. The problem is important in many automated image analysis applications. We identify the problem as one of clustering and formulate it within the framework of robust estimation using loss functions. We show how suitable loss functions may be chosen based on a priori knowledge of the noise distribution. Specifically, in the context of biological images, since the measurement noise is not Gaussian, quadratic loss functions yield suboptimal results. We show that by incorporating the Huber loss function, cells can be detected robustly and accurately. To initialize the algorithm, we also propose a seed selection approach. Simulation results show that Huber loss exhibits better performance compared with some standard loss functions. We also provide experimental results on confocal images of yeast cells. The proposed technique exhibits good detection performance even when the signal-to-noise ratio is low.
Resumo:
When document corpus is very large, we often need to reduce the number of features. But it is not possible to apply conventional Non-negative Matrix Factorization(NMF) on billion by million matrix as the matrix may not fit in memory. Here we present novel Online NMF algorithm. Using Online NMF, we reduced original high-dimensional space to low-dimensional space. Then we cluster all the documents in reduced dimension using k-means algorithm. We experimentally show that by processing small subsets of documents we will be able to achieve good performance. The method proposed outperforms existing algorithms.