305 resultados para Algorithm clustering

em Indian Institute of Science - Bangalore - Índia


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The k-means algorithm is an extremely popular technique for clustering data. One of the major limitations of the k-means is that the time to cluster a given dataset D is linear in the number of clusters, k. In this paper, we employ height balanced trees to address this issue. Specifically, we make two major contributions, (a) we propose an algorithm, RACK (acronym for RApid Clustering using k-means), which takes time favorably comparable with the fastest known existing techniques, and (b) we prove an expected bound on the quality of clustering achieved using RACK. Our experimental results on large datasets strongly suggest that RACK is competitive with the k-means algorithm in terms of quality of clustering, while taking significantly less time.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Relative geometric arrangements of the sample points, with reference to the structure of the imbedding space, produce clusters. Hence, if each sample point is imagined to acquire a volume of a small M-cube (called pattern-cell), depending on the ranges of its (M) features and number (N) of samples; then overlapping pattern-cells would indicate naturally closer sample-points. A chain or blob of such overlapping cells would mean a cluster and separate clusters would not share a common pattern-cell between them. The conditions and an analytic method to find such an overlap are developed. A simple, intuitive, nonparametric clustering procedure, based on such overlapping pattern-cells is presented. It may be classified as an agglomerative, hierarchical, linkage-type clustering procedure. The algorithm is fast, requires low storage and can identify irregular clusters. Two extensions of the algorithm, to separate overlapping clusters and to estimate the nature of pattern distributions in the sample space, are also indicated.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents hierarchical clustering algorithms for land cover mapping problem using multi-spectral satellite images. In unsupervised techniques, the automatic generation of number of clusters and its centers for a huge database is not exploited to their full potential. Hence, a hierarchical clustering algorithm that uses splitting and merging techniques is proposed. Initially, the splitting method is used to search for the best possible number of clusters and its centers using Mean Shift Clustering (MSC), Niche Particle Swarm Optimization (NPSO) and Glowworm Swarm Optimization (GSO). Using these clusters and its centers, the merging method is used to group the data points based on a parametric method (k-means algorithm). A performance comparison of the proposed hierarchical clustering algorithms (MSC, NPSO and GSO) is presented using two typical multi-spectral satellite images - Landsat 7 thematic mapper and QuickBird. From the results obtained, we conclude that the proposed GSO based hierarchical clustering algorithm is more accurate and robust.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A computationally efficient agglomerative clustering algorithm based on multilevel theory is presented. Here, the data set is divided randomly into a number of partitions. The samples of each such partition are clustered separately using hierarchical agglomerative clustering algorithm to form sub-clusters. These are merged at higher levels to get the final classification. This algorithm leads to the same classification as that of hierarchical agglomerative clustering algorithm when the clusters are well separated. The advantages of this algorithm are short run time and small storage requirement. It is observed that the savings, in storage space and computation time, increase nonlinearly with the sample size.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

K-means algorithm is a well known nonhierarchical method for clustering data. The most important limitations of this algorithm are that: (1) it gives final clusters on the basis of the cluster centroids or the seed points chosen initially, and (2) it is appropriate for data sets having fairly isotropic clusters. But this algorithm has the advantage of low computation and storage requirements. On the other hand, hierarchical agglomerative clustering algorithm, which can cluster nonisotropic (chain-like and concentric) clusters, requires high storage and computation requirements. This paper suggests a new method for selecting the initial seed points, so that theK-means algorithm gives the same results for any input data order. This paper also describes a hybrid clustering algorithm, based on the concepts of multilevel theory, which is nonhierarchical at the first level and hierarchical from second level onwards, to cluster data sets having (i) chain-like clusters and (ii) concentric clusters. It is observed that this hybrid clustering algorithm gives the same results as the hierarchical clustering algorithm, with less computation and storage requirements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The keyword based search technique suffers from the problem of synonymic and polysemic queries. Current approaches address only theproblem of synonymic queries in which different queries might have the same information requirement. But the problem of polysemic queries,i.e., same query having different intentions, still remains unaddressed. In this paper, we propose the notion of intent clusters, the members of which will have the same intention. We develop a clustering algorithm that uses the user session information in query logs in addition to query URL entries to identify cluster of queries having the same intention. The proposed approach has been studied through case examples from the actual log data from AOL, and the clustering algorithm is shown to be successful in discerning the user intentions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses a method for scaling SVM with Gaussian kernel function to handle large data sets by using a selective sampling strategy for the training set. It employs a scalable hierarchical clustering algorithm to construct cluster indexing structures of the training data in the kernel induced feature space. These are then used for selective sampling of the training data for SVM to impart scalability to the training process. Empirical studies made on real world data sets show that the proposed strategy performs well on large data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The problem of denoising damage indicator signals for improved operational health monitoring of systems is addressed by applying soft computing methods to design filters. Since measured data in operational settings is contaminated with noise and outliers, pattern recognition algorithms for fault detection and isolation can give false alarms. A direct approach to improving the fault detection and isolation is to remove noise and outliers from time series of measured data or damage indicators before performing fault detection and isolation. Many popular signal-processing approaches do not work well with damage indicator signals, which can contain sudden changes due to abrupt faults and non-Gaussian outliers. Signal-processing algorithms based on radial basis function (RBF) neural network and weighted recursive median (WRM) filters are explored for denoising simulated time series. The RBF neural network filter is developed using a K-means clustering algorithm and is much less computationally expensive to develop than feedforward neural networks trained using backpropagation. The nonlinear multimodal integer-programming problem of selecting optimal integer weights of the WRM filter is solved using genetic algorithm. Numerical results are obtained for helicopter rotor structural damage indicators based on simulated frequencies. Test signals consider low order polynomial growth of damage indicators with time to simulate gradual or incipient faults and step changes in the signal to simulate abrupt faults. Noise and outliers are added to the test signals. The WRM and RBF filters result in a noise reduction of 54 - 71 and 59 - 73% for the test signals considered in this study, respectively. Their performance is much better than the moving average FIR filter, which causes significant feature distortion and has poor outlier removal capabilities and shows the potential of soft computing methods for specific signal-processing applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The problem of denoising damage indicator signals for improved operational health monitoring of systems is addressed by applying soft computing methods to design filters. Since measured data in operational settings is contaminated with noise and outliers, pattern recognition algorithms for fault detection and isolation can give false alarms. A direct approach to improving the fault detection and isolation is to remove noise and outliers from time series of measured data or damage indicators before performing fault detection and isolation. Many popular signal-processing approaches do not work well with damage indicator signals, which can contain sudden changes due to abrupt faults and non-Gaussian outliers. Signal-processing algorithms based on radial basis function (RBF) neural network and weighted recursive median (WRM) filters are explored for denoising simulated time series. The RBF neural network filter is developed using a K-means clustering algorithm and is much less computationally expensive to develop than feedforward neural networks trained using backpropagation. The nonlinear multimodal integer-programming problem of selecting optimal integer weights of the WRM filter is solved using genetic algorithm. Numerical results are obtained for helicopter rotor structural damage indicators based on simulated frequencies. Test signals consider low order polynomial growth of damage indicators with time to simulate gradual or incipient faults and step changes in the signal to simulate abrupt faults. Noise and outliers are added to the test signals. The WRM and RBF filters result in a noise reduction of 54 - 71 and 59 - 73% for the test signals considered in this study, respectively. Their performance is much better than the moving average FIR filter, which causes significant feature distortion and has poor outlier removal capabilities and shows the potential of soft computing methods for specific signal-processing applications. (C) 2005 Elsevier B. V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A method for determining the mutual nearest neighbours (MNN) and mutual neighbourhood value (mnv) of a sample point, using the conventional nearest neighbours, is suggested. A nonparametric, hierarchical, agglomerative clustering algorithm is developed using the above concepts. The algorithm is simple, deterministic, noniterative, requires low storage and is able to discern spherical and nonspherical clusters. The method is applicable to a wide class of data of arbitrary shape, large size and high dimensionality. The algorithm can discern mutually homogenous clusters. Strong or weak patterns can be discerned by properly choosing the neighbourhood width.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A nonparametric, hierarchical, disaggregative clustering algorithm is developed using a novel similarity measure, called the mutual neighborhood value (MNV), which takes into account the conventional nearest neighbor ranks of two samples with respect to each other. The algorithm is simple, noniterative, requires low storage, and needs no specification of the expected number of clusters. The algorithm appears very versatile as it is capable of discerning spherical and nonspherical clusters, linearly nonseparable clusters, clusters with unequal populations, and clusters with lowdensity bridges. Changing of the neighborhood size enables discernment of strong or weak patterns.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper deals with a model-theoretic approach to clustering. The approach can be used to generate cluster description based on knowledge alone. Such a process of generating descriptions would be extremely useful in clustering partially specified objects. A natural byproduct of the proposed approach is that missing values of attributes of an object can be estimated with ease in a meaningful fashion. An important feature of the approach is that noisy objects can be detected effectively, leading to the formation of natural groups. The proposed algorithm is applied to a library database consisting of a collection of books.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The K-means algorithm for clustering is very much dependent on the initial seed values. We use a genetic algorithm to find a near-optimal partitioning of the given data set by selecting proper initial seed values in the K-means algorithm. Results obtained are very encouraging and in most of the cases, on data sets having well separated clusters, the proposed scheme reached a global minimum.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the knowledge-based clustering approaches reported in the literature, explicit know ledge, typically in the form of a set of concepts, is used in computing similarity or conceptual cohesiveness between objects and in grouping them. We propose a knowledge-based clustering approach in which the domain knowledge is also used in the pattern representation phase of clustering. We argue that such a knowledge-based pattern representation scheme reduces the complexity of similarity computation and grouping phases. We present a knowledge-based clustering algorithm for grouping hooks in a library.