716 resultados para Sequences (Mathematics)
Resumo:
Sequences from the tuf gene coding for the elongation factor EF-Tu were amplified and sequenced from the genomic DNA of Pirellula marina and Isosphaera pallida, two species of bacteria within the order Planctomycetales. A near-complete (1140-bp) sequence was obtained from Pi. marina and a partial (759-bp) sequence was obtained for I. pallida. Alignment of the deduced Pi. marina EF-Tu amino acid sequence against reference sequences demonstrated the presence of a unique Il-amino acid sequence motif not present in any other division of the domain Bacteria. Pi. marina shared the highest percentage amino acid sequence identity with I. pallida but showed only a low percentage identity with other members of the domain Bacteria. This is consistent with the concept of the planctomycetes as a unique division of the Bacteria. Neither primary sequence comparison of EF-Tu nor phylogenetic analysis supports any close relationship between planctomycetes and the chlamydiae, which has previously been postulated on the basis of 16S rRNA. Phylogenetic analysis of aligned EF-Tu amino acid sequences performed using distance, maximum-parsimony, and maximum likelihood approaches yielded contradictory results with respect to the position of planctomycetes relative to other bacteria, It is hypothesized that long-branch attraction effects due to unequal evolutionary rates and mutational saturation effects may account for some of the contradictions.
Resumo:
A possible role in RNA replication for interactions between conserved complementary (cyclization) sequences in the 5'- and 3'-terminal regions of Flavivirus RNA was previously suggested but never tested in vivo. Using the M-fold program for RNA secondary-structure predictions, we examined for the first time the base-pairing interactions between the covalently linked 5' genomic region (first similar to 160 nucleotides) and the 3' untranslated region (last similar to 115 nucleotides) for a range of mosquito borne Flavivirus species. Base-pairing occurred as predicted for the previously proposed conserved cyclization sequences. In order to obtain experimental evidence of the predicted interactions, the putative cyclization sequences (5' or 3') in the replicon RNA of the mosquito-borne Kunjin virus,were mutated either separately, to destroy base-pairing, or simultaneously, to restore the complementarity. None of the RNAs with separate mutations in only the 5' or only the 3' cyclization sequences was able to replicate after transfection into BHK cells, while replicon RNA with simultaneous compensatory mutations in both cyclization sequences was replication competent. This was detected by immunofluorescence for expression of the major nonstructural protein NS3 and by Northern blot analysis for amplification and accumulation of replicon RNA. We then used the M-fold program to analyze RNA secondary structure of the covalently linked 5'- and 3'-terminal regions of three tick borne virus species and identified a previously undescribed additional pair of conserved complementary sequences in locations similar to those of the mosquito borne species. They base-paired with DeltaG values of approximately -20 kcal, equivalent or greater in stability than those calculated for the originally proposed cyclization sequences. The results show that the base-pairing between 5' and 3' complementary sequences, rather than the nucleotide sequence per se, is essential for the replication of mosquito-borne Kunjin virus RNA and that more than one pair of cyclization sequences might be involved in the replication of the tick-borne Flavivirus species.
Resumo:
A new algorithm, PfAGSS, for predicting 3' splice sites in Plasmodium falciparum genomic sequences is described. Application of this program to the published P. falciparum chromosome 2 and 3 data suggests that existing programs result in a high error rate in assigning 3' intron boundaries. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
By spliced alignment of human DNA and transcript sequence data we constructed a data set of transcript-confirmed exons and introns from 2793 genes, 796 of which (28%) were seen to have multiple isoforms. We find that over one-third of human exons can translate in more than one frame, and that this is highly correlated with G+C content. Introns containing adenosine at donor site position +3 (A3), rather than guanosine (G3), are more common in low G+C regions, while the converse is true in high G+C regions. These two classes of introns are shown to have distinct lengths, consensus sequences and correlations among splice signals, leading to the hypothesis that A3 donor sites are associated with exon definition, and G3 donor sites with intron definition. Minor classes of introns, including GC-AG, U12-type GT-AG, weak, and putative AG-dependant introns are identified and characterized. Cassette exons are more prevalent in low G+C regions, while exon isoforms are more prevalent in high G+C regions. Cassette exon events outnumber other alternative events, while exon isoform events involve truncation twice as often as extension, and occur at acceptor sites twice as often as at donor sites. Alternative splicing is usually associated with weak splice signals, and in a majority of cases, preserves the coding frame. The reported characteristics of constitutive and alternative splice signals, and the hypotheses offered regarding alternative splicing and genome organization, have important implications for experimental research into RNA processing. The 'AltExtron' data sets are available at http://www.bit.uq.edu.au/altExtron/ and http://www.ebi.ac.uk/similar tothanaraj/altExtron/.
Resumo:
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.
Resumo:
Phylogenetic relationships among 75 species of Lentibulariaceae, representing the three recognized genera, were assessed by cladistic analysis of DNA sequences from the plastid rps16 intron and the trnL-F region. Sequence data from the two loci were analyzed both separately and in combination. Consensus trees from all analyses are congruent, and parsimony jackknife results demonstrate strong support for relationships both between and within each of the three demonstrably monophyletic genera. The genus Pinguicula is sister to a Genlisea-Utricularia clade, the phylogenetic structure within this clade closely follows Taylor's recent sectional delimitations based on morphology. Three principal clades are shown within Utricularia, with the basal sections Polypoinpholyx and Pleiochasia together forming the sister lineage of the remaining Utricularia species. Of the fundamental morphological specializations, the stoloniferous growth form apparently arose independently within Genlisea and Utricularia three times, and within Utricularia itself, perhaps more than once. The epiphytic habit has evolved independently at least three times, in Pinguicula, in Utricularia section Phyllaria, and within the two sections Orchidioides and Iperua (in the latter as bromeliad tank-epiphytes). The suspended aquatic habit may have evolved independently within sections Utricularia and Vesiculina. Biogeographic optimization on the phylogeny demonstrates patterns commonly associated with the boreotropics hypothesis and limits the spatial origin of Lentibulariaceae to temperate Eurasia or tropical America.
Resumo:
Darwin's paradigm holds that the diversity of present-day organisms has arisen via a process of genetic descent with modification, as on a bifurcating tree. Evidence is accumulating that genes are sometimes transferred not along lineages but rather across lineages. To the extent that this is so, Darwin's paradigm can apply only imperfectly to genomes, potentially complicating or perhaps undermining attempts to reconstruct historical relationships among genomes (i.e., a genome tree). Whether most genes in a genome have arisen via treelike (vertical) descent or by lateral transfer across lineages can be tested if enough complete genome sequences are used. We define a phylogenetically discordant sequence (PDS) as an open reading frame (ORF) that exhibits patterns of similarity relationships statistically distinguishable from those of most other ORFs in the same genome. PDSs represent between 6.0 and 16.8% (mean, 10.8%) of the analyzable ORFs in the genomes of 28 bacteria, eight archaea, and one eukaryote (Saccharomyces cerevisiae). In this study we developed and assessed a distance-based approach, based on mean pairwise sequence similarity, for generating genome trees. Exclusion of PDSs improved bootstrap support for basal nodes but altered few topological features, indicating that there is little systematic bias among PDSs. Many but not all features of the genome tree from which PDSs were excluded are consistent with the 16S rRNA tree.
Resumo:
Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Resumo:
The small GTPases R-Ras and H-Ras are highly homologous proteins with contrasting biological properties, for example, they differentially modulate integrin affinity: H-Ras suppresses integrin activation in fibroblasts whereas R-Ras can reverse this effect of H-Ras. To gain insight into the sequences directing this divergent phenotype, we investigated a panel of H-Ras/R-Ras chimeras and found that sequences in the R-Ras hypervariable C-terminal region including amino acids 175-203 are required for the R-Ras ability to increase integrin activation in CHO cells; however, the proline-rich site in this region, previously reported to bind the adaptor protein Nck, was not essential for this effect. In addition, we found that the GTPase TC21 behaved similarly to R-Ras. Because the C-termini of Ras proteins can control their subcellular localization, we compared the localization of H-Ras and R-Ras. In contrast to H-Ras, which migrates out of lipid rafts upon activation, we found that activated R-Ras remained localized to lipid rafts. However, functionally distinct H-Ras/R-Ras chimeras containing different C-terminal R-Ras segments localized to lipid rafts irrespective of their integrin phenotype. (C) 2003 Elsevier Inc. All rights reserved.
Resumo:
Epstein-Barr virus (EBV)-encoded oncogene latent membrane protein (LMP) 1, which is consistently expressed in multiple EBV-associated malignancies, has been proposed as a potential target antigen for any future vaccine designed to control these malignancies. However, the high degree of genetic variation in the LMP1 sequence has been considered a major impediment for its use as a potential immunotherapeutic target for the treatment of EBV-associated malignancies. In the present study, we have employed a highly efficient strategy, based on ex vivo functional assays, to conduct an extensive sequence-wide analysis of LMP1-specific T-cell responses in a large panel of healthy virus carriers of diverse ethnic origin and nasopharyngeal carcinoma patients. By comparing the frequencies of T cells specific for overlapping peptides spanning LMP1, we mapped a number of novel HLA class I- and class II-restricted LMP1 T-cell epitopes, including an epitope with dual HLA class I restriction. More importantly, extensive sequence analysis of LMP1 revealed that the majority of the T-cell epitopes were highly conserved in EBV isolates from Caucasian, Papua New Guinean, African, and Southeast Asian populations, while unique geographically constrained genetic variation was observed within one HLA A2 supertype-restricted epitope. These findings indicate that conserved LMP1 epitopes should be considered in designing epitope-based immunotherapeutic strategies against EBV-associated malignancies in different ethnic populations.