945 resultados para GENOME SEQUENCE
Resumo:
We characterized the consensus sequence and structure of a long terminal repeat (LTR) retrotransposon from the genome of the human blood fluke, Schistosoma japonicum, and have earned this element, Gulliver. The full length, consensus Gulliver LTR retrotransposon was 4788 bp, and it was flanked at its 5'- and 3'-ends by LTRs of 259 bp. Each LTR included RNA polymerase II promoter sequences, a CAAT signal and a TATA box, Gulliver exhibited features characteristic of a functional LTR retrotransposon including two read through (termination) ORFs encoding retroviral gag and pol proteins of 312 and 1071 amino acid residues, respectively. The gag ORF encoded motifs conserved in nucleic acid binding proteins, while the pol ORF encoded conserved domains of aspartic protease, reverse transcriptase (RT), RNaseH and integrase, in that order, a pol pattern conserved in the gypsy lineage of LTR retrotransposons. Whereas the sequence and structure of Gulliver was similar to that of gypsy, phylogenetic analysis revealed that Gulliver did not group particularly closely with the gypsy family. Rather, its closest relatives were a LTR retrotransposon from Caenorhabditis elegans, mag from Bombyx mori and, to a lesser extent, easel from the salmon Oncorhynchus keta. Dot blot hybridizations indicated that Gulliver was present at between 100 and several thousand copies in the S. japonicum genome, and Southern hybridization analysis suggested its probable presence in the genome of Schistosoma mansoni. Transcripts encoding the RT domain of Gulliver were detected by RT-PCR in larval and adult stages of S. japonicum, indicating that (at least) the RT domain of Gulliver is transcribed. This is the first report of the sequence and structure of an LTR retrotransposon from any schistosome or indeed from any species belonging to the phylum Platyhelminthes. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
The first chordates appear in the fossil record at the time of the Cambrian explosion, nearly 550 million years ago. The modern ascidian tadpole represents a plausible approximation to these ancestral chordates. To illuminate the origins of chordate and vertebrates, we generated a draft of the protein-coding portion of the genome of the most studied ascidian, Ciona intestinalis. The Ciona genome contains similar to16,000 protein-coding genes, similar to the number in other invertebrates, but only half that found in vertebrates. Vertebrate gene families are typically found in simplified form in Ciona, suggesting that ascidians contain the basic ancestral complement of genes involved in cell signaling and development. The ascidian genome has also acquired a number of lineage-specific innovations, including a group of genes engaged in cellulose metabolism that are related to those in bacteria and fungi.
Resumo:
Darwin's paradigm holds that the diversity of present-day organisms has arisen via a process of genetic descent with modification, as on a bifurcating tree. Evidence is accumulating that genes are sometimes transferred not along lineages but rather across lineages. To the extent that this is so, Darwin's paradigm can apply only imperfectly to genomes, potentially complicating or perhaps undermining attempts to reconstruct historical relationships among genomes (i.e., a genome tree). Whether most genes in a genome have arisen via treelike (vertical) descent or by lateral transfer across lineages can be tested if enough complete genome sequences are used. We define a phylogenetically discordant sequence (PDS) as an open reading frame (ORF) that exhibits patterns of similarity relationships statistically distinguishable from those of most other ORFs in the same genome. PDSs represent between 6.0 and 16.8% (mean, 10.8%) of the analyzable ORFs in the genomes of 28 bacteria, eight archaea, and one eukaryote (Saccharomyces cerevisiae). In this study we developed and assessed a distance-based approach, based on mean pairwise sequence similarity, for generating genome trees. Exclusion of PDSs improved bootstrap support for basal nodes but altered few topological features, indicating that there is little systematic bias among PDSs. Many but not all features of the genome tree from which PDSs were excluded are consistent with the 16S rRNA tree.
Resumo:
A newly described non-long terminal repeat (non-LTR) retrotransposon element was isolated from the genome of the Oriental schistosome, Schistosoma japonicum. At least 1000 partial copies of the element, which was named pido, were dispersed throughout the genome of S. japonicum. As is usual with non-LTR retrotransposons, it is expected that many pido elements will be 5'-truncated. A consensus sequence of 3564 bp of the truncated pido element was assembled from several genomic fragments that contained pido-hybridizing sequences. The sequence encoded part of the first open reading frame (ORF), the entire second ORF and, at its 3'-terminus, a tandemly repetitive, A-rich (TA(6)TA(5)TA(8)) tail, The ORF1 of pido encoded a nucleic acid binding protein and ORF2 encoded a retroviral-like polyprotein that included apurinic/apyrimidinic endonuclease (EN) and reverse transcriptase (RT) domains, in that order. Based on its sequence and structure, and phylogenetic analyses of both the RT and EN domains, pido belongs to the chicken repeat 1 (CR1)-like lineage of elements known from the chicken, turtle, puffer fish, mosquitoes and other taxa. pido shared equal similarity with CRI from chicken, an uncharacterized retrotransposon from Caenorhabditis elegans and SR1 (a non-LTR retrotransposon) from the related blood fluke Schistosoma mansoni; the level of similarity between pido and SR1 indicated that these two schistosome retrotransposons were related but not orthologous. The findings indicate that schistosomes have been colonized by at least two discrete CRI-like elements. Whereas pido did not appear to have a tight target site specificity, at least one copy of pido has inserted into the 3'-untranslated region of a protein-encoding gene (GeriBank AW736757) of as yet unknown identity. mRNA encoding the RT of pido was detected by reverse transcription-polymerase chain reaction in the egg, miracidium. and adult developmental stages of S. japonicum, indicating that the RT domain was transcribed and suggesting that pido was replicating actively and mobile within the S. japonicum genome. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
Our previous studies have shown that two distinct genotypes of Sindbis (SIN) virus occur in Australia. One of these, the Oriental/Australian type, circulates throughout most of the Australian continent, whereas the recently identified south-west (SW) genetic type appears to be restricted to a distinct geographic region located in the temperate south-west of Australia. We have now determined the complete nucleotide and translated amino acid sequences of a SW isolate of SIN virus (SW6562) and performed comparative analyses with other SIN viruses at the genomic level. The genome of SW6562 is 11,569 nucleotides in length, excluding the cap nucleotide and poly (A) tail. Overall this virus differs from the prototype SIN virus (strain AR339) by 23% in nucleotide sequence and 12.5% in amino acid sequence. Partial sequences of four regions of the genome of four SW isolates were determined and compared with the corresponding sequences from a number of SIN isolates from different regions of the World. These regions are the non-structural protein (nsP3), the E2 gene, the capsid gene, and the repeated sequence elements (RSE) of the 3'UTR. These comparisons revealed that the SW SIN viruses were more closely related to South African and European strains than to other Australian isolates of SIN virus. Thus the SW genotype of SIN virus may have been introduced into this region of Australia by viremic humans or migratory birds and subsequently evolved independently in the region. The sequence data also revealed that the SW genotype contains a unique deletion in the RSE of the 3'UTR region of the genome. Previous studies have shown that deletions in this region of the SIN genome can have significant effects on virus replication in mosquito and avian cells, which may explain the restricted distribution of this genotype of SIN virus.
Resumo:
To help understand the mechanisms of gene rearrangement in the mitochondrial (mt) genomes of hemipteroid insects, we sequenced the mt genome of the plague thrips, Thrips imaginis (Thysanoptera). This genome is circular, 15,407 by long, and has many unusual features, including (1) rRNA genes inverted and distant from one another, (2) an extra gene for tRNA-Ser, (3) a tRNA-Val lacking a D-arm, (4) two pseudo-tRNA genes, (5) duplicate control regions, and (6) translocations and/or inversions of 24 of the 37 genes. The mechanism of rRNA gene transcription in T. imaginis may be different from that of other arthropods since the two rRNA genes have inverted and are distant from one another. Further, the rRNA genes are not adjacent or even close to either of the two control regions. Tandem duplication and deletion is a plausible model for the evolution of duplicate control regions and for the gene translocations, but intramitochondrial recombination may account for the gene inversions in T. imaginis. All the 18 genes between control regions #1 and #2 have translocated and/or inverted, whereas only six of the 20 genes outside this region have translocated and/or inverted. Moreover, the extra tRNA gene and the two pseudo-tRNA genes are either in this region or immediately adjacent to one of the control regions. These observations suggest that tandem duplication and deletion may be facilitated by the duplicate control regions and may have occurred a number of times in the lineage leading to T. imaginis. T. imaginis shares two novel gene boundaries with a lepidopsocid species from another order of hemipteroid insects, the Psocoptera. The evidence available suggests that these shared gene boundaries evolved by convergence and thus are not informative for the interordinal phylogeny of hemipteroid insects. We discuss the potential of hemipteroid insects as a model system for studies of the evolution of animal rut genomes and outline some fundamental questions that may be addressed with this system.
Resumo:
We describe a novel approach to explore DNA nucleotide sequence data, aiming to produce high-level categorical and structural information about the underlying chromosomes, genomes and species. The article starts by analyzing chromosomal data through histograms using fixed length DNA sequences. After creating the DNA-related histograms, a correlation between pairs of histograms is computed, producing a global correlation matrix. These data are then used as input to several data processing methods for information extraction and tabular/graphical output generation. A set of 18 species is processed and the extensive results reveal that the proposed method is able to generate significant and diversified outputs, in good accordance with current scientific knowledge in domains such as genomics and phylogenetics.
Resumo:
The first extensive catalog of structural human variation was recently released. It showed that large stretches of genomic DNA that vary considerably in copy number were extremely abundant. Thus it is conceivable that they play a major role in functional variation. Consistently, genomic insertions and deletions were shown to contribute to phenotypic differences by modifying not only the expression levels of genes within the aneuploid segments but also of normal copy-number neighboring genes. In this report, we review the possible mechanisms behind this latter effect.
Resumo:
We have isolated a clone of Trypanosoma cruzi genimic DNA, lambda 3b2-5, which contains sequences that are reiterated in the genome. Northtern blot analysis showed that clone 3b2-5 hybridizes to 1,200-5,000 bases different mRNA species. The number of mRNAs species hybridized to clone 3b2-5 exceeds its coding capacity showing that this clone carries sequences that are common to several mRNAs species and conserved in the poly A(+) RNA. These sequences are not homologous to the T. cruzi spliced leader sequence, since clone 3b2-5 hybridize to a synthetic 20 nucleotice complementary to the spliced leader sequence. Clone 3b2-5 does not hybridize to DNA and RNA from several genera of Trypanosomatidae and other Trypanosoma species indicating that it carries T. cruzi species-specific sequences.
Resumo:
Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).
Resumo:
Tandemly repeated DNA sequences are found in the genome of higher eukaryotes, and have also been demonstrated in Trypanosoma cruzi. Repeated DNA sequences are potentially useful for the diagnostic detection of T. cruzi (A. Gonzales et al., 1984, Proc. Natl. Acad. Sci. USA, 81: 3356-3360). We have isoleted two clones from a genomic library of T. cruzi (Y strain) that contain, in one clone a family of at least seven copies of a repetitive sequence of approximately 600 base pairs, and in the other an independent copy of the same sequence. One copy of the repetition (HSP) and the independent clone (HCR) were sequenced by the Sanger procedure (Fig.). This sequence hybridized to four strains of T. cruzi tested and did not hybridize to eleven species of trypanosotids from five different Genera, being a good candidate for diagnostic assays. GenBank accession numbers: HSP#m31919, HCR#31920.
Resumo:
Microarray transcript profiling and RNA interference are two new technologies crucial for large-scale gene function studies in multicellular eukaryotes. Both rely on sequence-specific hybridization between complementary nucleic acid strands, inciting us to create a collection of gene-specific sequence tags (GSTs) representing at least 21,500 Arabidopsis genes and which are compatible with both approaches. The GSTs were carefully selected to ensure that each of them shared no significant similarity with any other region in the Arabidopsis genome. They were synthesized by PCR amplification from genomic DNA. Spotted microarrays fabricated from the GSTs show good dynamic range, specificity, and sensitivity in transcript profiling experiments. The GSTs have also been transferred to bacterial plasmid vectors via recombinational cloning protocols. These cloned GSTs constitute the ideal starting point for a variety of functional approaches, including reverse genetics. We have subcloned GSTs on a large scale into vectors designed for gene silencing in plant cells. We show that in planta expression of GST hairpin RNA results in the expected phenotypes in silenced Arabidopsis lines. These versatile GST resources provide novel and powerful tools for functional genomics.