4 resultados para statistical analysis
em National Center for Biotechnology Information - NCBI
Resumo:
Two objects with homologous landmarks are said to be of the same shape if the configuration of landmarks of one object can be exactly matched with that of the other by translation, rotation/reflection, and scaling. In an earlier paper, the authors proposed statistical analysis of shape by considering logarithmic differences of all possible Euclidean distances between landmarks. Tests of significance for differences in the shape of objects and methods of discrimination between populations were developed with such data. In the present paper, the corresponding statistical methodology is developed by triangulation of the landmarks and by considering the angles as natural measurements of shape. This method is applied to the study of sexual dimorphism in hominids.
Resumo:
The availability of complete genome sequences and mRNA expression data for all genes creates new opportunities and challenges for identifying DNA sequence motifs that control gene expression. An algorithm, “MobyDick,” is presented that decomposes a set of DNA sequences into the most probable dictionary of motifs or words. This method is applicable to any set of DNA sequences: for example, all upstream regions in a genome or all genes expressed under certain conditions. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter ones of various lengths, eliminating the need for a separate set of reference data to define probabilities. We have built a dictionary with 1,200 words for the 6,000 upstream regulatory regions in the yeast genome; the 500 most significant words (some with as few as 10 copies in all of the upstream regions) match 114 of 443 experimentally determined sites (a significance level of 18 standard deviations). When analyzing all of the genes up-regulated during sporulation as a group, we find many motifs in addition to the few previously identified by analyzing the subclusters individually to the expression subclusters. Applying MobyDick to the genes derepressed when the general repressor Tup1 is deleted, we find known as well as putative binding sites for its regulatory partners.
Resumo:
Two objects with homologous landmarks are said to be of the same shape if the configurations of landmarks of one object can be exactly matched with that of the other by translation, rotation/reflection, and scaling. The observations on an object are coordinates of its landmarks with reference to a set of orthogonal coordinate axes in an appropriate dimensional space. The origin, choice of units, and orientation of the coordinate axes with respect to an object may be different from object to object. In such a case, how do we quantify the shape of an object, find the mean and variation of shape in a population of objects, compare the mean shapes in two or more different populations, and discriminate between objects belonging to two or more different shape distributions. We develop some methods that are invariant to translation, rotation, and scaling of the observations on each object and thereby provide generalizations of multivariate methods for shape analysis.
Resumo:
Methods of structural and statistical analysis of the relation between the sequence and secondary and three-dimensional structures are developed. About 5000 secondary structures of immunoglobulin molecules from the Kabat data base were predicted. Two statistical analyses of amino acids reveal 47 universal positions in strands and loops. Eight universally conservative positions out of the 47 are singled out because they contain the same amino acid in > 90% of all chains. The remaining 39 positions, which we term universally alternative positions, were divided into five groups: hydrophobic, charged and polar, aromatic, hydrophilic, and Gly-Ala, corresponding to the residues that occupied them in almost all chains. The analysis of residue-residue contacts shows that the 47 universal positions can be distinguished by the number and types of contacts. The calculations of contact maps in the 29 antibody structures revealed that residues in 24 of these 47 positions have contacts only with residues of antiparallel beta-strands in the same beta-sheet and residues in the remaining 23 positions always have far-away contacts with residues from other beta-sheets as well. In addition, residues in 6 of the 47 universal positions are also involved in interactions with residues of the other variable or constant domains.