5 resultados para multivariate statistical analysis
em National Center for Biotechnology Information - NCBI
Resumo:
Two objects with homologous landmarks are said to be of the same shape if the configurations of landmarks of one object can be exactly matched with that of the other by translation, rotation/reflection, and scaling. The observations on an object are coordinates of its landmarks with reference to a set of orthogonal coordinate axes in an appropriate dimensional space. The origin, choice of units, and orientation of the coordinate axes with respect to an object may be different from object to object. In such a case, how do we quantify the shape of an object, find the mean and variation of shape in a population of objects, compare the mean shapes in two or more different populations, and discriminate between objects belonging to two or more different shape distributions. We develop some methods that are invariant to translation, rotation, and scaling of the observations on each object and thereby provide generalizations of multivariate methods for shape analysis.
Resumo:
Two objects with homologous landmarks are said to be of the same shape if the configuration of landmarks of one object can be exactly matched with that of the other by translation, rotation/reflection, and scaling. In an earlier paper, the authors proposed statistical analysis of shape by considering logarithmic differences of all possible Euclidean distances between landmarks. Tests of significance for differences in the shape of objects and methods of discrimination between populations were developed with such data. In the present paper, the corresponding statistical methodology is developed by triangulation of the landmarks and by considering the angles as natural measurements of shape. This method is applied to the study of sexual dimorphism in hominids.
Resumo:
The availability of complete genome sequences and mRNA expression data for all genes creates new opportunities and challenges for identifying DNA sequence motifs that control gene expression. An algorithm, “MobyDick,” is presented that decomposes a set of DNA sequences into the most probable dictionary of motifs or words. This method is applicable to any set of DNA sequences: for example, all upstream regions in a genome or all genes expressed under certain conditions. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter ones of various lengths, eliminating the need for a separate set of reference data to define probabilities. We have built a dictionary with 1,200 words for the 6,000 upstream regulatory regions in the yeast genome; the 500 most significant words (some with as few as 10 copies in all of the upstream regions) match 114 of 443 experimentally determined sites (a significance level of 18 standard deviations). When analyzing all of the genes up-regulated during sporulation as a group, we find many motifs in addition to the few previously identified by analyzing the subclusters individually to the expression subclusters. Applying MobyDick to the genes derepressed when the general repressor Tup1 is deleted, we find known as well as putative binding sites for its regulatory partners.
Resumo:
Sm and Sm-like proteins are members of a family of small proteins that is widespread throughout eukaryotic kingdoms. These proteins form heteromers with one another and bind, as heteromeric complexes, to various RNAs, recognizing primarily short U-rich stretches. Interestingly, completion of several genome projects revealed that archaea also contain genes that may encode Sm-like proteins. Herein, we studied the properties of one Sm-like protein derived from the archaebacterium Archaeoglobus fulgidus and overexpressed in Escherichia coli. This single small protein closely reflects the properties of an Sm or Sm-like protein heteromer. It binds to RNA with a high specificity for oligo(U), and assembles onto the RNA to form a complex that exhibits, as judged by electron microscopy, a ring-like structure similar to the ones observed with the Sm core ribonucleoprotein and the like Sm (LSm) protein heteromer. Importantly, multivariate statistical analysis of negative-stain electron-microscopic images revealed a sevenfold symmetry for the observed ring structure, indicating that the proteins form a homoheptamer. These results support the structural model of the Sm proteins derived from crystallographic studies on Sm heterodimers and demonstrate that the Sm protein family evolved from a single ancestor that was present before the eukaryotic and archaeal kingdoms separated.
Resumo:
Methods of structural and statistical analysis of the relation between the sequence and secondary and three-dimensional structures are developed. About 5000 secondary structures of immunoglobulin molecules from the Kabat data base were predicted. Two statistical analyses of amino acids reveal 47 universal positions in strands and loops. Eight universally conservative positions out of the 47 are singled out because they contain the same amino acid in > 90% of all chains. The remaining 39 positions, which we term universally alternative positions, were divided into five groups: hydrophobic, charged and polar, aromatic, hydrophilic, and Gly-Ala, corresponding to the residues that occupied them in almost all chains. The analysis of residue-residue contacts shows that the 47 universal positions can be distinguished by the number and types of contacts. The calculations of contact maps in the 29 antibody structures revealed that residues in 24 of these 47 positions have contacts only with residues of antiparallel beta-strands in the same beta-sheet and residues in the remaining 23 positions always have far-away contacts with residues from other beta-sheets as well. In addition, residues in 6 of the 47 universal positions are also involved in interactions with residues of the other variable or constant domains.