838 resultados para Evolutionary clustering
Resumo:
The pattern of expression of the genes involved in the utilization of aryl beta-glucosides such as arbutin and salicin is different in the genus Shigella compared to Escherichia coli. The results presented here indicate that the homologue of the cryptic bgl operon of E. coli is conserved in Shigella sonnei and is the primary system involved in beta-glucoside utilization in the organism. The organization of the bgl genes in 5. sonnei is similar to that of E. coli; however there are three major differences in terms of their pattern of expression. (i) The bglB gene, encoding phospho-beta-glucosidase B, is insertionally inactivated in 5. sonnei. As a result, mutational activation of the silent bgl promoter confers an Arbutin-positive (Arb(+)) phenotype to the cells in a single step; however, acquiring a Salicin-positive (Sal(+)) phenotype requires the reversion or suppression of the bglB mutation in addition. (ii) Unlike in E. coli, a majority of the activating mutations (conferring the Arb(+) phenotype) map within the unlinked hns locus, whereas activation of the E. coli bgl operon under the same conditions is predominantly due to insertions within the bglR locus. (iii) Although the bgl promoter is silent in the wild-type strain of 5. sonnei (as in the case of E. coli), transcriptional and functional analyses indicated a higher basal level of transcription of the downstream genes. This was correlated with a 1 bp deletion within the putative Rho-independent terminator present in the leader sequence preceding the homologue of the bglG gene. The possible evolutionary implications of these differences for the maintenance of the genes in the cryptic state are discussed.
Resumo:
A method for determining the mutual nearest neighbours (MNN) and mutual neighbourhood value (mnv) of a sample point, using the conventional nearest neighbours, is suggested. A nonparametric, hierarchical, agglomerative clustering algorithm is developed using the above concepts. The algorithm is simple, deterministic, noniterative, requires low storage and is able to discern spherical and nonspherical clusters. The method is applicable to a wide class of data of arbitrary shape, large size and high dimensionality. The algorithm can discern mutually homogenous clusters. Strong or weak patterns can be discerned by properly choosing the neighbourhood width.
Resumo:
A nonparametric, hierarchical, disaggregative clustering algorithm is developed using a novel similarity measure, called the mutual neighborhood value (MNV), which takes into account the conventional nearest neighbor ranks of two samples with respect to each other. The algorithm is simple, noniterative, requires low storage, and needs no specification of the expected number of clusters. The algorithm appears very versatile as it is capable of discerning spherical and nonspherical clusters, linearly nonseparable clusters, clusters with unequal populations, and clusters with lowdensity bridges. Changing of the neighborhood size enables discernment of strong or weak patterns.
Resumo:
The paper deals with a model-theoretic approach to clustering. The approach can be used to generate cluster description based on knowledge alone. Such a process of generating descriptions would be extremely useful in clustering partially specified objects. A natural byproduct of the proposed approach is that missing values of attributes of an object can be estimated with ease in a meaningful fashion. An important feature of the approach is that noisy objects can be detected effectively, leading to the formation of natural groups. The proposed algorithm is applied to a library database consisting of a collection of books.
Resumo:
The 3' terminal 1255 nt sequence of Physalis mottle virus (PhMV) genomic RNA has been determined from a set of overlapping cDNA clones. The open reading frame (ORF) at the 3' terminus corresponds to the amino acid sequence of the coat protein (CP) determined earlier except for the absence of the dipeptide, Lys-Leu, at position 110-111. In addiition, the sequence upstream of the CP gene contains the message coding for 178 amino acid residues of the C-terminus of the putative replicase protein (RP). The sequence downstream of the CP gene contains an untranslated region whose terminal 80 nucleotides can be folded into a characteristic tRNA-like structure. A phylogenetic tree constructed after aligning separately the sequence of the CP, the replicase protein (RP) and the tRNA-like structure determined in this study with the corresponding sequences of other tymoviruses shows that PhMV wrongly named belladonna mottle virus [BDMV(I)] is a separate tymovirus and not another strain of BDMV(E) as originally envisaged. The phylogenetic tree in all the three cases is identical showing that any subset of genomic sequence of sufficient length can be used for establishing evolutionary relationships among tymoviruses.
Resumo:
Relative geometric arrangements of the sample points, with reference to the structure of the imbedding space, produce clusters. Hence, if each sample point is imagined to acquire a volume of a small M-cube (called pattern-cell), depending on the ranges of its (M) features and number (N) of samples; then overlapping pattern-cells would indicate naturally closer sample-points. A chain or blob of such overlapping cells would mean a cluster and separate clusters would not share a common pattern-cell between them. The conditions and an analytic method to find such an overlap are developed. A simple, intuitive, nonparametric clustering procedure, based on such overlapping pattern-cells is presented. It may be classified as an agglomerative, hierarchical, linkage-type clustering procedure. The algorithm is fast, requires low storage and can identify irregular clusters. Two extensions of the algorithm, to separate overlapping clusters and to estimate the nature of pattern distributions in the sample space, are also indicated.
Resumo:
Clustering is a process of partitioning a given set of patterns into meaningful groups. The clustering process can be viewed as consisting of the following three phases: (i) feature selection phase, (ii) classification phase, and (iii) description generation phase. Conventional clustering algorithms implicitly use knowledge about the clustering environment to a large extent in the feature selection phase. This reduces the need for the environmental knowledge in the remaining two phases, permitting the usage of simple numerical measure of similarity in the classification phase. Conceptual clustering algorithms proposed by Michalski and Stepp [IEEE Trans. PAMI, PAMI-5, 396–410 (1983)] and Stepp and Michalski [Artif. Intell., pp. 43–69 (1986)] make use of the knowledge about the clustering environment in the form of a set of predefined concepts to compute the conceptual cohesiveness during the classification phase. Michalski and Stepp [IEEE Trans. PAMI, PAMI-5, 396–410 (1983)] have argued that the results obtained with the conceptual clustering algorithms are superior to conventional methods of numerical classification. However, this claim was not supported by the experimental results obtained by Dale [IEEE Trans. PAMI, PAMI-7, 241–244 (1985)]. In this paper a theoretical framework, based on an intuitively appealing set of axioms, is developed to characterize the equivalence between the conceptual clustering and conventional clustering. In other words, it is shown that any classification obtained using conceptual clustering can also be obtained using conventional clustering and vice versa.
Resumo:
Species identification forms the basis for understanding the diversity of the living world, but it is also a prerequisite for understanding many evolutionary patterns and processes. The most promising approach for correctly delimiting and identifying species is to integrate many types of information in the same study. Our aim was to test how cuticular hydro- carbons, traditional morphometrics, genetic polymorphisms in nuclear markers (allozymes and DNA microsatellites) and DNA barcoding (partial mitochondrial COI gene) perform in delimiting species. As an example, we used two closely related Formica ants, F. fusca and F. lemani, sampled from a sympatric population in the northern part of their distribu- tion. Morphological characters vary and overlap in different parts of their distribution areas, but cuticular hydrocarbons include a strong taxonomic signal and our aim is to test the degree to which morphological and genetic data correspond to the chemical data. In the morphological analysis, species were best separated by the combined number of hairs on pro- notum and mesonotum, but individual workers overlapped in hair numbers, as previously noted by several authors. Nests of the two species were separated but not clustered according to species in a Principal Component Analysis made on nuclear genetic data. However, model-based Bayesian clustering resulted in perfect separation of the species and gave no indication of hybridization. Furthermore, F. lemani and F. fusca did not share any mitochondrial haplotypes, and the species were perfectly separated in a phylogenetic tree. We conclude that F. fusca and F. lemani are valid species that can be separated in our study area relatively well with all methods employed. However, the unusually small genetic differen- tiation in nuclear markers (FST = 0.12) shows that they are closely related, and occasional hybridization between F. fusca and F. lemani cannot be ruled out.
Resumo:
Here we rederive the hierarchy of equations for the evolution of distribution functions of various orders using a convenient parameterization. We use this to obtain equations for two- and three-point correlation functions in powers of a small parameter, viz., the initial density contrast. The correspondence of the lowest order solutions of these equations to the results from the linear theory of density perturbations is shown for an OMEGA = 1 universe. These equations are then used to calculate, to the lowest order, the induced three-point correlation function that arises from Gaussian initial conditions in an OMEGA = 1 universe. We obtain an expression which explicitly exhibits the spatial structure of the induced three-point correlation function. It is seen that the spatial structure of this quantity is independent of the value of OMEGA. We also calculate the triplet momentum. We find that the induced three-point correlation function does not have the ''hierarchical'' form often assumed. We discuss possibilities of using the induced three-point correlation to interpret observational data. The formalism developed here can also be used to test a validity of different schemes to close the
Resumo:
The complete amino-acid sequence of sheep liver cytosolic serine hydroxymethyltransferase was determined from an analysis of tryptic, chymotryptic, CNBr and hydroxylamine peptides. Each subunit of sheep liver serine hydroxymethyltransferase consisted of 483 amino-acid residues. A comparison of this sequence with 8 other serine hydroxymethyltransferases revealed that a possible gene duplication event could have occurred after the divergence of animals and fungi. This analysis also showed independent duplication of SHMT genes in Neurospora crassa. At the secondary structural level, all the serine hydroxymethyltransferases belong to the alpha/beta category of proteins. The predicted secondary structure of sheep liver serine hydroxymethyltransferase was similar to that of the observed structure of tryptophan synthase, another pyridoxal 5'-phosphate containing enzyme, suggesting that sheep liver serine hydroxymethyltransferase might have a similar pyridoxal 5'-phosphate binding domain. In addition, a conserved glycine rich region, G L Q G G P, was identified in all the serine hydroxymethyltransferases and could be important in pyridoxal 5'-phosphate binding. A comparison of the cytosolic serine hydroxymethyltransferases from rabbit and sheep liver with other proteins sequenced from both these sources showed that serine hydroxymethyltransferase was a highly conserved protein. It was slightly less conserved than cytochrome c but better conserved than myoglobin, both of which are well known evolutionary markers. C67 and C203 were specifically protected by pyridoxal 5'-phosphate against modification with [C-14]iodoacetic acid, while C247 and C261 were buried in the native serine hydroxymethyltransferase. However, the cysteines are not conserved among the various serine hydroxymethyltransferases. The exact role of the cysteines in the reaction catalyzed by serine hydroxymethyltransferase remains to be elucidated.
Resumo:
This paper presents a dan-based evolutionary approach for solving control problems. Three selected control problems, viz. linear-quadratic, harvest, and push-cart problems, are solved using the proposed approach. Results are compared with those of the evolutionary programming (EP) approach. In most of the cases, the proposed approach is successful in obtaining (near) optimal solutions for these selected problems.
Resumo:
In the knowledge-based clustering approaches reported in the literature, explicit know ledge, typically in the form of a set of concepts, is used in computing similarity or conceptual cohesiveness between objects and in grouping them. We propose a knowledge-based clustering approach in which the domain knowledge is also used in the pattern representation phase of clustering. We argue that such a knowledge-based pattern representation scheme reduces the complexity of similarity computation and grouping phases. We present a knowledge-based clustering algorithm for grouping hooks in a library.
Resumo:
We use the BBGKY hierarchy equations to calculate, perturbatively, the lowest order nonlinear correction to the two-point correlation and the pair velocity for Gaussian initial conditions in a critical density matter-dominated cosmological model. We compare our results with the results obtained using the hydrodynamic equations that neglect pressure and find that the two match, indicating that there are no effects of multistreaming at this order of perturbation. We analytically study the effect of small scales on the large scales by calculating the nonlinear correction for a Dirac delta function initial two-point correlation. We find that the induced two-point correlation has a x(-6) behavior at large separations. We have considered a class of initial conditions where the initial power spectrum at small k has the form k(n) with 0 < n less than or equal to 3 and have numerically calculated the nonlinear correction to the two-point correlation, its average over a sphere and the pair velocity over a large dynamical range. We find that at small separations the effect of the nonlinear term is to enhance the clustering, whereas at intermediate scales it can act to either increase or decrease the clustering. At large scales we find a simple formula that gives a very good fit for the nonlinear correction in terms of the initial function. This formula explicitly exhibits the influence of small scales on large scales and because of this coupling the perturbative treatment breaks down at large scales much before one would expect it to if the nonlinearity were local in real space. We physically interpret this formula in terms of a simple diffusion process. We have also investigated the case n = 0, and we find that it differs from the other cases in certain respects. We investigate a recently proposed scaling property of gravitational clustering, and we find that the lowest order nonlinear terms cause deviations from the scaling relations that are strictly valid in the linear regime. The approximate validity of these relations in the nonlinear regime in l(T)-body simulations cannot be understood at this order of evolution.
Resumo:
In this article, we present a novel application of a quantum clustering (QC) technique to objectively cluster the conformations, sampled by molecular dynamics simulations performed on different ligand bound structures of the protein. We further portray each conformational population in terms of dynamically stable network parameters which beautifully capture the ligand induced variations in the ensemble in atomistic detail. The conformational populations thus identified by the QC method and verified by network parameters are evaluated for different ligand bound states of the protein pyrrolysyl-tRNA synthetase (DhPylRS) from D. hafniense. The ligand/environment induced re-distribution of protein conformational ensembles forms the basis for understanding several important biological phenomena such as allostery and enzyme catalysis. The atomistic level characterization of each population in the conformational ensemble in terms of the re-orchestrated networks of amino acids is a challenging problem, especially when the changes are minimal at the backbone level. Here we demonstrate that the QC method is sensitive to such subtle changes and is able to cluster MD snapshots which are similar at the side-chain interaction level. Although we have applied these methods on simulation trajectories of a modest time scale (20 ns each), we emphasize that our methodology provides a general approach towards an objective clustering of large-scale MD simulation data and may be applied to probe multistate equilibria at higher time scales, and to problems related to protein folding for any protein or protein-protein/RNA/DNA complex of interest with a known structure.