89 resultados para Evolutionary clustering

em Indian Institute of Science - Bangalore - Índia


Relevância:

40.00% 40.00%

Publicador:

Resumo:

n this paper, a multistage evolutionary scheme is proposed for clustering in a large data base, like speech data. This is achieved by clustering a small subset of the entire sample set in each stage and treating the cluster centroids so obtained as samples, together with another subset of samples not considered previously, as input data to the next stage. This is continued till the whole sample set is exhausted. The clustering is accomplished by constructing a fuzzy similarity matrix and using the fuzzy techniques proposed here. The technique is illustrated by an efficient scheme for voiced-unvoiced-silence classification of speech.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering has been the most popular method for data exploration. Clustering is partitioning the data set into sub-partitions based on some measures say the distance measure, each partition has its own significant information. There are a number of algorithms explored for this purpose, one such algorithm is the Particle Swarm Optimization(PSO) which is a population based heuristic search technique derived from swarm intelligence. In this paper we present an improved version of the Particle Swarm Optimization where, each feature of the data set is given significance accordingly by adding some random weights, which also minimizes the distortions in the dataset if any. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The experimental results shows that our proposed methodology performs significantly better than the previously performed experiments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The work reported hen was motivated by a desire to verify the existence of structure - specifically MP-rich clusters induced by sodium bromide (NaBr) in the ternary liquid mixture 3-methylpyridine (Mf) + water(W) + NaBr. We present small-angle X-ray scattering (SAXS) measurements in this mixture. These measurements were obtained at room temperature (similar to 298 K) in the one-phase region (below the relevant lower consolute points, T(L)s) at different values of X (i.e., X = 0.02 - 0.17), where X is the weight fraction of NaBr in the mixture. Cluster-size distribution, estimated on the assumption that the clusters are spherical, shows systematic behaviour in that the peak of the distribution shifts rewards larger values of cluster radius as X increases. The largest spatial extent of the clusters (similar to 4.5 nm) is seen at X = 0.17. Data analysis assuming arbitrary shapes and sizes of clusters gives a limiting value of cluster size (- 4.5 nm) that is not very sensitive to X. It is suggested that the cluster size determined may not be the same as the usual critical-point fluctuations far removed from the critical point (T-L). The influence of the additional length scale due to clustering is discussed from the standpoint of crossover from Ising to mean-field critical behaviour, when moving away from the T-L.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider a two timescale model of learning by economic agents wherein active or 'ontogenetic' learning by individuals takes place on a fast scale and passive or 'phylogenetic' learning by society as a whole on a slow scale, each affecting the evolution of the other. The former is modelled by the Monte Carlo dynamics of physics, while the latter is modelled by the replicator dynamics of evolutionary biology. Various qualitative aspects of the dynamics are studied in some simple cases, both analytically and numerically, and its role as a useful modelling device is emphasized.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper the notion of conceptual cohesiveness is precised and used to group objects semantically, based on a knowledge structure called ‘cohesion forest’. A set of axioms is proposed which should be satisfied to make the generated clusters meaningful.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A computationally efficient agglomerative clustering algorithm based on multilevel theory is presented. Here, the data set is divided randomly into a number of partitions. The samples of each such partition are clustered separately using hierarchical agglomerative clustering algorithm to form sub-clusters. These are merged at higher levels to get the final classification. This algorithm leads to the same classification as that of hierarchical agglomerative clustering algorithm when the clusters are well separated. The advantages of this algorithm are short run time and small storage requirement. It is observed that the savings, in storage space and computation time, increase nonlinearly with the sample size.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

K-means algorithm is a well known nonhierarchical method for clustering data. The most important limitations of this algorithm are that: (1) it gives final clusters on the basis of the cluster centroids or the seed points chosen initially, and (2) it is appropriate for data sets having fairly isotropic clusters. But this algorithm has the advantage of low computation and storage requirements. On the other hand, hierarchical agglomerative clustering algorithm, which can cluster nonisotropic (chain-like and concentric) clusters, requires high storage and computation requirements. This paper suggests a new method for selecting the initial seed points, so that theK-means algorithm gives the same results for any input data order. This paper also describes a hybrid clustering algorithm, based on the concepts of multilevel theory, which is nonhierarchical at the first level and hierarchical from second level onwards, to cluster data sets having (i) chain-like clusters and (ii) concentric clusters. It is observed that this hybrid clustering algorithm gives the same results as the hierarchical clustering algorithm, with less computation and storage requirements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The concept of feature selection in a nonparametric unsupervised learning environment is practically undeveloped because no true measure for the effectiveness of a feature exists in such an environment. The lack of a feature selection phase preceding the clustering process seriously affects the reliability of such learning. New concepts such as significant features, level of significance of features, and immediate neighborhood are introduced which result in meeting implicitly the need for feature slection in the context of clustering techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The concept of feature selection in a nonparametric unsupervised learning environment is practically undeveloped because no true measure for the effectiveness of a feature exists in such an environment. The lack of a feature selection phase preceding the clustering process seriously affects the reliability of such learning. New concepts such as significant features, level of significance of features, and immediate neighborhood are introduced which result in meeting implicitly the need for feature slection in the context of clustering techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new clustering technique, based on the concept of immediato neighbourhood, with a novel capability to self-learn the number of clusters expected in the unsupervized environment, has been developed. The method compares favourably with other clustering schemes based on distance measures, both in terms of conceptual innovations and computational economy. Test implementation of the scheme using C-l flight line training sample data in a simulated unsupervized mode has brought out the efficacy of the technique. The technique can easily be implemented as a front end to established pattern classification systems with supervized learning capabilities to derive unified learning systems capable of operating in both supervized and unsupervized environments. This makes the technique an attractive proposition in the context of remotely sensed earth resources data analysis wherein it is essential to have such a unified learning system capability.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: The members of cupin superfamily exhibit large variations in their sequences, functions, organization of domains, quaternary associations and the nature of bound metal ion, despite having a conserved beta-barrel structural scaffold. Here, an attempt has been made to understand structure-function relationships among the members of this diverse superfamily and identify the principles governing functional diversity. The cupin superfamily also contains proteins for which the structures are available through world-wide structural genomics initiatives but characterized as ``hypothetical''. We have explored the feasibility of obtaining clues to functions of such proteins by means of comparative analysis with cupins of known structure and function. Methodology/Principal Findings: A 3-D structure-based phylogenetic approach was undertaken. Interestingly, a dendrogram generated solely on the basis of structural dissimilarity measure at the level of domain folds was found to cluster functionally similar members. This clustering also reflects an independent evolution of the two domains in bicupins. Close examination of structural superposition of members across various functional clusters reveals structural variations in regions that not only form the active site pocket but are also involved in interaction with another domain in the same polypeptide or in the oligomer. Conclusions/Significance: Structure-based phylogeny of cupins can influence identification of functions of proteins of yet unknown function with cupin fold. This approach can be extended to other proteins with a common fold that show high evolutionary divergence. This approach is expected to have an influence on the function annotation in structural genomics initiatives.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acyl carrier protein is an integral component of many cellular metabolic processes. A number of studies have reported self-acylation behavior in acyl carrier proteins. Although AM exhibit high levels of similarity in their primary and tertiary structures, self-acylation behavior is restricted to only some ACPs that can be classified into two major families based on their function. The first family of ACPs is involved in polyketide biosynthesis, whereas the second family participates in fatty acid synthesis. Facilitated by the growing number of genome sequences available for analyses, large-scale phylogenetic studies were used in these studies to uncover as to how self-acylation behavior of acyl carrier proteins is linked with the evolution of metabolic pathways in organisms. These studies show that self-acylation behavior in acyl carrier proteins was lost during the course of evolution, with certain organisms and organelles viz. plastids, retaining it for specified functions. (C) 2009 IUBMB IUBMB Life, 61(8): 853-859, 2009

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The k-means algorithm is an extremely popular technique for clustering data. One of the major limitations of the k-means is that the time to cluster a given dataset D is linear in the number of clusters, k. In this paper, we employ height balanced trees to address this issue. Specifically, we make two major contributions, (a) we propose an algorithm, RACK (acronym for RApid Clustering using k-means), which takes time favorably comparable with the fastest known existing techniques, and (b) we prove an expected bound on the quality of clustering achieved using RACK. Our experimental results on large datasets strongly suggest that RACK is competitive with the k-means algorithm in terms of quality of clustering, while taking significantly less time.