89 resultados para Evolutionary clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Several recently discovered peculiar Type Ia supernovae seem to demand an altogether new formation theory that might help explain the puzzling dissimilarities between them and the standard Type Ia supernovae. The most striking aspect of the observational analysis is the necessity of invoking super-Chandrasekhar white dwarfs having masses similar to 2.1-2.8 M-circle dot, M-circle dot being the mass of Sun, as their most probable progenitors. Strongly magnetized white dwarfs having super-Chandrasekhar masses have already been established as potential candidates for the progenitors of peculiar Type Ia supernovae. Owing to the Landau quantization of the underlying electron degenerate gas, theoretical results yielded the observationally inferred mass range. Here, we sketch a possible evolutionary scenario by which super-Chandrasekhar white dwarfs could be formed by accretion on to a commonly observed magnetized white dwarf, invoking the phenomenon of flux freezing. This opens multiple possible evolution scenarios ending in supernova explosions of super-Chandrasekhar white dwarfs having masses within the range stated above. We point out that our proposal has observational support, such as the recent discovery of a large number of magnetized white dwarfs by the Sloan Digital Sky Survey.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

When document corpus is very large, we often need to reduce the number of features. But it is not possible to apply conventional Non-negative Matrix Factorization(NMF) on billion by million matrix as the matrix may not fit in memory. Here we present novel Online NMF algorithm. Using Online NMF, we reduced original high-dimensional space to low-dimensional space. Then we cluster all the documents in reduced dimension using k-means algorithm. We experimentally show that by processing small subsets of documents we will be able to achieve good performance. The method proposed outperforms existing algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The role of crystallite size and clustering in influencing the stability of the structures of a large tetragonality ferroelectric system 0.6BiFeO(3)-0.4PbTiO(3) was investigated. The system exhibits cubic phase for a crystallite size similar to 25 nm, three times larger than the critical size reported for one of its end member PbTiO3. With increased degree of clustering for the same average crystallite size, partial stabilization of the ferroelectric tetragonal phase takes place. The results suggest that clustering helps in reducing the depolarization energy without the need for increasing the crystallite size of free particles.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Learning from Positive and Unlabelled examples (LPU) has emerged as an important problem in data mining and information retrieval applications. Existing techniques are not ideally suited for real world scenarios where the datasets are linearly inseparable, as they either build linear classifiers or the non-linear classifiers fail to achieve the desired performance. In this work, we propose to extend maximum margin clustering ideas and present an iterative procedure to design a non-linear classifier for LPU. In particular, we build a least squares support vector classifier, suitable for handling this problem due to symmetry of its loss function. Further, we present techniques for appropriately initializing the labels of unlabelled examples and for enforcing the ratio of positive to negative examples while obtaining these labels. Experiments on real-world datasets demonstrate that the non-linear classifier designed using the proposed approach gives significantly better generalization performance than the existing relevant approaches for LPU.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We analytically study the role played by the network topology in sustaining cooperation in a society of myopic agents in an evolutionary setting. In our model, each agent plays the Prisoner's Dilemma (PD) game with its neighbors, as specified by a network. Cooperation is the incumbent strategy, whereas defectors are the mutants. Starting with a population of cooperators, some agents are switched to defection. The agents then play the PD game with their neighbors and compute their fitness. After this, an evolutionary rule, or imitation dynamic is used to update the agent strategy. A defector switches back to cooperation if it has a cooperator neighbor with higher fitness. The network is said to sustain cooperation if almost all defectors switch to cooperation. Earlier work on the sustenance of cooperation has largely consisted of simulation studies, and we seek to complement this body of work by providing analytical insight for the same. We find that in order to sustain cooperation, a network should satisfy some properties such as small average diameter, densification, and irregularity. Real-world networks have been empirically shown to exhibit these properties, and are thus candidates for the sustenance of cooperation. We also analyze some specific graphs to determine whether or not they sustain cooperation. In particular, we find that scale-free graphs belonging to a certain family sustain cooperation, whereas Erdos-Renyi random graphs do not. To the best of our knowledge, ours is the first analytical attempt to determine which networks sustain cooperation in a population of myopic agents in an evolutionary setting.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning and data mining. Clustering is grouping of a data set or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait according to some defined distance measure. In this paper we present the genetically improved version of particle swarm optimization algorithm which is a population based heuristic search technique derived from the analysis of the particle swarm intelligence and the concepts of genetic algorithms (GA). The algorithm combines the concepts of PSO such as velocity and position update rules together with the concepts of GA such as selection, crossover and mutation. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The performance of our method is better than k-means and PSO algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data clustering groups data so that data which are similar to each other are in the same group and data which are dissimilar to each other are in different groups. Since generally clustering is a subjective activity, it is possible to get different clusterings of the same data depending on the need. This paper attempts to find the best clustering of the data by first carrying out feature selection and using only the selected features, for clustering. A PSO (Particle Swarm Optimization)has been used for clustering but feature selection has also been carried out simultaneously. The performance of the above proposed algorithm is evaluated on some benchmark data sets. The experimental results shows the proposed methodology outperforms the previous approaches such as basic PSO and Kmeans for the clustering problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study investigates the application of support vector clustering (SVC) for the direct identification of coherent synchronous generators in large interconnected multi-machine power systems. The clustering is based on coherency measure, which indicates the degree of coherency between any pair of generators. The proposed SVC algorithm processes the coherency measure matrix that is formulated using the generator rotor measurements to cluster the coherent generators. The proposed approach is demonstrated on IEEE 10 generator 39-bus system and an equivalent 35 generators, 246-bus system of practical Indian southern grid. The effect of number of data samples and fault locations are also examined for determining the accuracy of the proposed approach. An extended comparison with other clustering techniques is also included, to show the effectiveness of the proposed approach in grouping the data into coherent groups of generators. This effectiveness of the coherent clusters obtained with the proposed approach is compared in terms of a set of clustering validity indicators and in terms of statistical assessment that is based on the coherency degree of a generator pair.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Conformational changes in proteins are extremely important for their biochemical functions. Correlation between inherent conformational variations in a protein and conformational differences in its homologues of known structure is still unclear. In this study, we have used a structural alphabet called Protein Blocks (PBs). PBs are used to perform abstraction of protein 3-D structures into a 1-D strings of 16 alphabets (a-p) based on dihedral angles of overlapping pentapeptides. We have analyzed the variations in local conformations in terms of PBs represented in the ensembles of 801 protein structures determined using NMR spectroscopy. In the analysis of concatenated data over all the residues in all the NMR ensembles, we observe that the overall nature of inherent local structural variations in NMR ensembles is similar to the nature of local structural differences in homologous proteins with a high correlation coefficient of .94. High correlation at the alignment positions corresponding to helical and beta-sheet regions is only expected. However, the correlation coefficient by considering only the loop regions is also quite high (.91). Surprisingly, segregated position-wise analysis shows that this high correlation does not hold true to loop regions at the structurally equivalent positions in NMR ensembles and their homologues of known structure. This suggests that the general nature of local structural changes is unique; however most of the local structural variations in loop regions of NMR ensembles do not correlate to their local structural differences at structurally equivalent positions in homologues.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We develop iterative diffraction tomography algorithms, which are similar to the distorted Born algorithms, for inverting scattered intensity data. Within the Born approximation, the unknown scattered field is expressed as a multiplicative perturbation to the incident field. With this, the forward equation becomes stable, which helps us compute nearly oscillation-free solutions that have immediate bearing on the accuracy of the Jacobian computed for use in a deterministic Gauss-Newton (GN) reconstruction. However, since the data are inherently noisy and the sensitivity of measurement to refractive index away from the detectors is poor, we report a derivative-free evolutionary stochastic scheme, providing strictly additive updates in order to bridge the measurement-prediction misfit, to arrive at the refractive index distribution from intensity transport data. The superiority of the stochastic algorithm over the GN scheme for similar settings is demonstrated by the reconstruction of the refractive index profile from simulated and experimentally acquired intensity data. (C) 2014 Optical Society of America

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Regionalization approaches are widely used in water resources engineering to identify hydrologically homogeneous groups of watersheds that are referred to as regions. Pooled information from sites (depicting watersheds) in a region forms the basis to estimate quantiles associated with hydrological extreme events at ungauged/sparsely gauged sites in the region. Conventional regionalization approaches can be effective when watersheds (data points) corresponding to different regions can be separated using straight lines or linear planes in the space of watershed related attributes. In this paper, a kernel-based Fuzzy c-means (KFCM) clustering approach is presented for use in situations where such linear separation of regions cannot be accomplished. The approach uses kernel-based functions to map the data points from the attribute space to a higher-dimensional space where they can be separated into regions by linear planes. A procedure to determine optimal number of regions with the KFCM approach is suggested. Further, formulations to estimate flood quantiles at ungauged sites with the approach are developed. Effectiveness of the approach is demonstrated through Monte-Carlo simulation experiments and a case study on watersheds in United States. Comparison of results with those based on conventional Fuzzy c-means clustering, Region-of-influence approach and a prior study indicate that KFCM approach outperforms the other approaches in forming regions that are closer to being statistically homogeneous and in estimating flood quantiles at ungauged sites. Key Points Kernel-based regionalization approach is presented for flood frequency analysis Kernel procedure to estimate flood quantiles at ungauged sites is developed A set of fuzzy regions is delineated in Ohio, USA

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Significance: The bi-domain protein tyrosine phosphatases (PTPs) exemplify functional evolution in signaling proteins for optimal spatiotemporal signal transduction. Bi-domain PTPs are products of gene duplication. The catalytic activity, however, is often localized to one PTP domain. The inactive PTP domain adopts multiple functional roles. These include modulation of catalytic activity, substrate specificity, and stability of the bi-domain enzyme. In some cases, the inactive PTP domain is a receptor for redox stimuli. Since multiple bi-domain PTPs are concurrently active in related cellular pathways, a stringent regulatory mechanism and selective cross-talk is essential to ensure fidelity in signal transduction. Recent Advances: The inactive PTP domain is an activator for the catalytic PTP domain in some cases, whereas it reduces catalytic activity in other bi-domain PTPs. The relative orientation of the two domains provides a conformational rationale for this regulatory mechanism. Recent structural and biochemical data reveal that these PTP domains participate in substrate recruitment. The inactive PTP domain has also been demonstrated to undergo substantial conformational rearrangement and oligomerization under oxidative stress. Critical Issues and Future Directions: The role of the inactive PTP domain in coupling environmental stimuli with catalytic activity needs to be further examined. Another aspect that merits attention is the role of this domain in substrate recruitment. These aspects have been poorly characterized in vivo. These lacunae currently restrict our understanding of neo-functionalization of the inactive PTP domain in the bi-domain enzyme. It appears likely that more data from these research themes could form the basis for understanding the fidelity in intracellular signal transduction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the preponderance of multidomain proteins in eukaryotic genomes, it is essential to recognize the constituent domains and their functions. Often function involves communications across the domain interfaces, and the knowledge of the interacting sites is essential to our understanding of the structure-function relationship. Using evolutionary information extracted from homologous domains in at least two diverse domain architectures (single and multidomain), we predict the interface residues corresponding to domains from the two-domain proteins. We also use information from the three-dimensional structures of individual domains of two-domain proteins to train naive Bayes classifier model to predict the interfacial residues. Our predictions are highly accurate (approximate to 85%) and specific (approximate to 95%) to the domain-domain interfaces. This method is specific to multidomain proteins which contain domains in at least more than one protein architectural context. Using predicted residues to constrain domain-domain interaction, rigid-body docking was able to provide us with accurate full-length protein structures with correct orientation of domains. We believe that these results can be of considerable interest toward rational protein and interaction design, apart from providing us with valuable information on the nature of interactions. Proteins 2014; 82:1219-1234. (c) 2013 Wiley Periodicals, Inc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Crystal structure determination of the lectin domain of MSMEG_3662 from Mycobacterium smegmatis and its complexes with mannose and methyl-alpha-mannose, the first effort of its kind on a mycobacterial lectin, reveals a structure very similar to beta-prism II fold lectins from plant sources, but with extensive unprecedented domain swapping in dimer formation. The two subunits in a dimer often show small differences in structure, but the two domains, not always related by 2-fold symmetry, have the same structure. Each domain carries three sugar-binding sites, similar to those in plant lectins, one on each Greek key motif. The occurrence of beta-prism II fold lectins in bacteria, with characteristics similar to those from plants, indicates that this family of lectins is of ancient origin and had evolved into a mature system before bacteria and plants diverged. In plants, the number of binding sites per domain varies between one and three, whereas the number is two in the recently reported lectin domains from Pseudomonas putida and Pseudomonas aeruginosa. An analysis of the sequences of the lectins and the lectin domains shows that the level of sequence similarity among the three Greek keys in each domain has a correlation with the number of binding sites in it. Furthermore, sequence conservation among the lectins from different species is the highest for that Greek key which carries a binding site in all of them. Thus, it would appear that carbohydrate binding influences the course of the evolution of the lectin.