305 resultados para clustering algorithm
Resumo:
This paper presents an improved hierarchical clustering algorithm for land cover mapping problem using quasi-random distribution. Initially, Niche Particle Swarm Optimization (NPSO) with pseudo/quasi-random distribution is used for splitting the data into number of cluster centers by satisfying Bayesian Information Criteria (BIC). Themain objective is to search and locate the best possible number of cluster and its centers. NPSO which highly depends on the initial distribution of particles in search space is not been exploited to its full potential. In this study, we have compared more uniformly distributed quasi-random with pseudo-random distribution with NPSO for splitting data set. Here to generate quasi-random distribution, Faure method has been used. Performance of previously proposed methods namely K-means, Mean Shift Clustering (MSC) and NPSO with pseudo-random is compared with the proposed approach - NPSO with quasi distribution(Faure). These algorithms are used on synthetic data set and multi-spectral satellite image (Landsat 7 thematic mapper). From the result obtained we conclude that use of quasi-random sequence with NPSO for hierarchical clustering algorithm results in a more accurate data classification.
Resumo:
This paper illustrates the application of a new technique, based on Support Vector Clustering (SVC) for the direct identification of coherent synchronous generators in a large interconnected Multi-Machine Power Systems. The clustering is based on coherency measures, obtained from the time domain responses of the generators following system disturbances. The proposed clustering algorithm could be integrated into a wide-area measurement system that enables fast identification of coherent clusters of generators for the construction of dynamic equivalent models. An application of the proposed method is demonstrated on a practical 15 generators 72-bus system, an equivalent of Indian Southern grid in an attempt to show the effectiveness of this clustering approach. The effects of short circuit fault locations on coherency are also investigated.
Resumo:
We propose a new approach to clustering. Our idea is to map cluster formation to coalition formation in cooperative games, and to use the Shapley value of the patterns to identify clusters and cluster representatives. We show that the underlying game is convex and this leads to an efficient biobjective clustering algorithm that we call BiGC. The algorithm yields high-quality clustering with respect to average point-to-center distance (potential) as well as average intracluster point-to-point distance (scatter). We demonstrate the superiority of BiGC over state-of-the-art clustering algorithms (including the center based and the multiobjective techniques) through a detailed experimentation using standard cluster validity criteria on several benchmark data sets. We also show that BiGC satisfies key clustering properties such as order independence, scale invariance, and richness.
Resumo:
Among the multiple advantages and applications of remote sensing, one of the most important uses is to solve the problem of crop classification, i.e., differentiating between various crop types. Satellite images are a reliable source for investigating the temporal changes in crop cultivated areas. In this letter, we propose a novel bat algorithm (BA)-based clustering approach for solving crop type classification problems using a multispectral satellite image. The proposed partitional clustering algorithm is used to extract information in the form of optimal cluster centers from training samples. The extracted cluster centers are then validated on test samples. A real-time multispectral satellite image and one benchmark data set from the University of California, Irvine (UCI) repository are used to demonstrate the robustness of the proposed algorithm. The performance of the BA is compared with two other nature-inspired metaheuristic techniques, namely, genetic algorithm and particle swarm optimization. The performance is also compared with the existing hybrid approach such as the BA with K-means. From the results obtained, it can be concluded that the BA can be successfully applied to solve crop type classification problems.
Resumo:
Core Vector Machine(CVM) is suitable for efficient large-scale pattern classification. In this paper, a method for improving the performance of CVM with Gaussian kernel function irrespective of the orderings of patterns belonging to different classes within the data set is proposed. This method employs a selective sampling based training of CVM using a novel kernel based scalable hierarchical clustering algorithm. Empirical studies made on synthetic and real world data sets show that the proposed strategy performs well on large data sets.
Resumo:
This paper investigates a new Glowworm Swarm Optimization (GSO) clustering algorithm for hierarchical splitting and merging of automatic multi-spectral satellite image classification (land cover mapping problem). Amongst the multiple benefits and uses of remote sensing, one of the most important has been its use in solving the problem of land cover mapping. Image classification forms the core of the solution to the land cover mapping problem. No single classifier can prove to classify all the basic land cover classes of an urban region in a satisfactory manner. In unsupervised classification methods, the automatic generation of clusters to classify a huge database is not exploited to their full potential. The proposed methodology searches for the best possible number of clusters and its center using Glowworm Swarm Optimization (GSO). Using these clusters, we classify by merging based on parametric method (k-means technique). The performance of the proposed unsupervised classification technique is evaluated for Landsat 7 thematic mapper image. Results are evaluated in terms of the classification efficiency - individual, average and overall.
Resumo:
Summary form only given. A scheme for code compression that has a fast decompression algorithm, which can be implemented using simple hardware, is proposed. The effectiveness of the scheme on the TMS320C62x architecture that includes the overheads of a line address table (LAT) is evaluated and obtained compression rates ranging from 70% to 80%. Two schemes for decompression are proposed. The basic idea underlying the scheme is a simple clustering algorithm that partially maps a block of instructions into a set of clusters. The clustering algorithm is a greedy algorithm based on the frequency of occurrence of various instructions.
Resumo:
This paper describes a new method of color text localization from generic scene images containing text of different scripts and with arbitrary orientations. A representative set of colors is first identified using the edge information to initiate an unsupervised clustering algorithm. Text components are identified from each color layer using a combination of a support vector machine and a neural network classifier trained on a set of low-level features derived from the geometric, boundary, stroke and gradient information. Experiments on camera-captured images that contain variable fonts, size, color, irregular layout, non-uniform illumination and multiple scripts illustrate the robustness of the method. The proposed method yields precision and recall of 0.8 and 0.86 respectively on a database of 100 images. The method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset.
Resumo:
This paper presents a new hierarchical clustering algorithm for crop stage classification using hyperspectral satellite image. Amongst the multiple benefits and uses of remote sensing, one of the important application is to solve the problem of crop stage classification. Modern commercial imaging satellites, owing to their large volume of satellite imagery, offer greater opportunities for automated image analysis. Hence, we propose a unsupervised algorithm namely Hierarchical Artificial Immune System (HAIS) of two steps: splitting the cluster centers and merging them. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The classification results have been compared with K-means and Artificial Immune System algorithms. From the results obtained, we conclude that the proposed hierarchical clustering algorithm is accurate.
Resumo:
In this paper, we present a methodology for identifying best features from a large feature space. In high dimensional feature space nearest neighbor search is meaningless. In this feature space we see quality and performance issue with nearest neighbor search. Many data mining algorithms use nearest neighbor search. So instead of doing nearest neighbor search using all the features we need to select relevant features. We propose feature selection using Non-negative Matrix Factorization(NMF) and its application to nearest neighbor search. Recent clustering algorithm based on Locally Consistent Concept Factorization(LCCF) shows better quality of document clustering by using local geometrical and discriminating structure of the data. By using our feature selection method we have shown further improvement of performance in the clustering.
Resumo:
This paper primarily intends to develop a GIS (geographical information system)-based data mining approach for optimally selecting the locations and determining installed capacities for setting up distributed biomass power generation systems in the context of decentralized energy planning for rural regions. The optimal locations within a cluster of villages are obtained by matching the installed capacity needed with the demand for power, minimizing the cost of transportation of biomass from dispersed sources to power generation system, and cost of distribution of electricity from the power generation system to demand centers or villages. The methodology was validated by using it for developing an optimal plan for implementing distributed biomass-based power systems for meeting the rural electricity needs of Tumkur district in India consisting of 2700 villages. The approach uses a k-medoid clustering algorithm to divide the total region into clusters of villages and locate biomass power generation systems at the medoids. The optimal value of k is determined iteratively by running the algorithm for the entire search space for different values of k along with demand-supply matching constraints. The optimal value of the k is chosen such that it minimizes the total cost of system installation, costs of transportation of biomass, and transmission and distribution. A smaller region, consisting of 293 villages was selected to study the sensitivity of the results to varying demand and supply parameters. The results of clustering are represented on a GIS map for the region.
Resumo:
The presence of a large number of spectral bands in the hyperspectral images increases the capability to distinguish between various physical structures. However, they suffer from the high dimensionality of the data. Hence, the processing of hyperspectral images is applied in two stages: dimensionality reduction and unsupervised classification techniques. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The selected dimensions are classified using Niche Hierarchical Artificial Immune System (NHAIS). The NHAIS combines the splitting method to search for the optimal cluster centers using niching procedure and the merging method is used to group the data points based on majority voting. Results are presented for two hyperspectral images namely EO-1 Hyperion image and Indian pines image. A performance comparison of this proposed hierarchical clustering algorithm with the earlier three unsupervised algorithms is presented. From the results obtained, we deduce that the NHAIS is efficient.
Resumo:
In this paper, we present a novel algorithm for piecewise linear regression which can learn continuous as well as discontinuous piecewise linear functions. The main idea is to repeatedly partition the data and learn a linear model in each partition. The proposed algorithm is similar in spirit to k-means clustering algorithm. We show that our algorithm can also be viewed as a special case of an EM algorithm for maximum likelihood estimation under a reasonable probability model. We empirically demonstrate the effectiveness of our approach by comparing its performance with that of the state of art algorithms on various datasets. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
Structural information over the entire course of binding interactions based on the analyses of energy landscapes is described, which provides a framework to understand the events involved during biomolecular recognition. Conformational dynamics of malectin's exquisite selectivity for diglucosylated N-glycan (Dig-N-glycan), a highly flexible oligosaccharide comprising of numerous dihedral torsion angles, are described as an example. For this purpose, a novel approach based on hierarchical sampling for acquiring metastable molecular conformations constituting low-energy minima for understanding the structural features involved in a biologic recognition is proposed. For this purpose, four variants of principal component analysis were employed recursively in both Cartesian space and dihedral angles space that are characterized by free energy landscapes to select the most stable conformational substates. Subsequently, k-means clustering algorithm was implemented for geometric separation of the major native state to acquire a final ensemble of metastable conformers. A comparison of malectin complexes was then performed to characterize their conformational properties. Analyses of stereochemical metrics and other concerted binding events revealed surface complementarity, cooperative and bidentate hydrogen bonds, water-mediated hydrogen bonds, carbohydrate-aromatic interactions including CH-pi and stacking interactions involved in this recognition. Additionally, a striking structural transition from loop to beta-strands in malectin CRD upon specific binding to Dig-N-glycan is observed. The interplay of the above-mentioned binding events in malectin and Dig-N-glycan supports an extended conformational selection model as the underlying binding mechanism.
Resumo:
The k-means algorithm is an extremely popular technique for clustering data. One of the major limitations of the k-means is that the time to cluster a given dataset D is linear in the number of clusters, k. In this paper, we employ height balanced trees to address this issue. Specifically, we make two major contributions, (a) we propose an algorithm, RACK (acronym for RApid Clustering using k-means), which takes time favorably comparable with the fastest known existing techniques, and (b) we prove an expected bound on the quality of clustering achieved using RACK. Our experimental results on large datasets strongly suggest that RACK is competitive with the k-means algorithm in terms of quality of clustering, while taking significantly less time.