36 resultados para base sequence
Resumo:
Methods of structural and statistical analysis of the relation between the sequence and secondary and three-dimensional structures are developed. About 5000 secondary structures of immunoglobulin molecules from the Kabat data base were predicted. Two statistical analyses of amino acids reveal 47 universal positions in strands and loops. Eight universally conservative positions out of the 47 are singled out because they contain the same amino acid in > 90% of all chains. The remaining 39 positions, which we term universally alternative positions, were divided into five groups: hydrophobic, charged and polar, aromatic, hydrophilic, and Gly-Ala, corresponding to the residues that occupied them in almost all chains. The analysis of residue-residue contacts shows that the 47 universal positions can be distinguished by the number and types of contacts. The calculations of contact maps in the 29 antibody structures revealed that residues in 24 of these 47 positions have contacts only with residues of antiparallel beta-strands in the same beta-sheet and residues in the remaining 23 positions always have far-away contacts with residues from other beta-sheets as well. In addition, residues in 6 of the 47 universal positions are also involved in interactions with residues of the other variable or constant domains.
Resumo:
A new method for computing evolutionary distances between DNA sequences is proposed. Contrasting with classical methods, the underlying model does not assume that sequence base compositions (A, C, G, and T contents) are at equilibrium, thus allowing unequal base compositions among compared sequences. This makes the method more efficient than the usual ones in recovering phylogenetic trees from sequence data when base composition is heterogeneous within the data set, as we show by using both simulated and empirical data. When applied to small-subunit ribosomal RNA sequences from several prokaryotic or eukaryotic organisms, this method provides evidence for an early divergence of the microsporidian Vairimorpha necatrix in the eukaryotic lineage.
Resumo:
We have previously reported an enhanced version of sequencing by hybridization (SBH), termed positional SBH (PSBH). PSBH uses partially duplex probes containing single-stranded 3' overhangs, instead of simple single-stranded probes. Stacking interactions between the duplex probe and a single-stranded target allow us to reduce the probe sizes required to 5-base single-stranded overhangs. Here we demonstrate the use of PSBH to capture relatively long single-stranded DNA targets and perform standard solid-state Sanger sequencing on these primer-template complexes without ligation. Our results indicate that only 5 bases of known terminal sequence are required for priming. In addition, the partially duplex probes have the ability to capture their specific target from a mixture of five single-stranded targets with different 3'-terminal sequences. This indicates the potential utility of the PSBH approach to sequence mixtures of DNA targets without prior purification.
Resumo:
An extensive sequence comparison of the chloroplast ndhF gene from all major clades of the largest flowering plant family (Asteraceae) shows that this gene provides approximately 3 times more phylogenetic information than rbcL. This is because it is substantially longer and evolves twice as fast. The 5' region (1380 bp) of ndhF is very different from the 3' region (855 bp) and is similar to rbcL in both the rate and the pattern of sequence change. The 3' region is more A+T-rich, has higher levels of nonsynonymous base substitution, and shows greater transversion bias at all codon positions. These differences probably reflect different functional constraints on the 5' and 3' regions of ndhF. The two patterns of base substitutions of ndhF are particularly advantageous for phylogenetic reconstruction because the conserved and variable segments can be used for older and recent groups, respectively. Phylogenetic analyses of 94 ndhF sequences provided much better resolution of relationships than previous molecular and morphological phylogenies of the Asteraceae. The ndhF tree identified five major clades: (i) the Calyceraceae is the sister family of Asteraceae; (ii) the Barnadesioideae is monophyletic and is the sister group to the rest of the family; (iii) the Cichorioideae and its two basal tribes Mutisieae and Cardueae are paraphyletic; (iv) four tribes of Cichorioideae (Lactuceae, Arctoteae, Liabeae, and Vernonieae) form a monophyletic group, and these are the sister clade of the Asteroideae; and (v) the Asteroideae is monophyletic and includes three major clades.
Resumo:
The correspondence between the transversion/transition ratio and the neighboring base composition in chloroplast DNA is examined. For 18 noncoding regions of the chloroplast genome, alignments between rice (Oryza sativa) and maize (Zea mays) were generated by two different methods. Difficulties of aligning noncoding DNA are discussed, and the alignments are analyzed in a manner that reduces alignment artifacts. Sequence divergence is < 10%, so multiple substitutions at a site are assumed to be rare. Observed substitutions were analyzed with respect to the A+T content of the two immediately flanking bases. It is shown that as this content increases, the proportion of transversions also increases. When both the 5'- and 3'-flanking nucleotides are G or C (A+T content of 0), only 25% of the observed substitutions are transversions. However, when both the 5'- and 3'-flanking nucleotides are A or T (A+T content of 2), 57% of the observed substitutions are transversions. Therefore, the influence of flanking base composition on substitutions, previously reported for a single noncoding region, is a general feature of the chloroplast genome.
Resumo:
We present a method for predicting protein folding class based on global protein chain description and a voting process. Selection of the best descriptors was achieved by a computer-simulated neural network trained on a data base consisting of 83 folding classes. Protein-chain descriptors include overall composition, transition, and distribution of amino acid attributes, such as relative hydrophobicity, predicted secondary structure, and predicted solvent exposure. Cross-validation testing was performed on 15 of the largest classes. The test shows that proteins were assigned to the correct class (correct positive prediction) with an average accuracy of 71.7%, whereas the inverse prediction of proteins as not belonging to a particular class (correct negative prediction) was 90-95% accurate. When tested on 254 structures used in this study, the top two predictions contained the correct class in 91% of the cases.