951 resultados para Noncoding Regions
Resumo:
Three novel families of transposable elements, Wukong, Wujin, and Wuneng, are described in the yellow fever mosquito, Aedes aegypti. Their copy numbers range from 2,100 to 3,000 per haploid genome. There are high degrees of sequence similarity within each family, and many structural but not sequence similarities between families. The common structural characteristics include small size, no coding potential, terminal inverted repeats, potential to form a stable secondary structure, A+T richness, and putative 2- to 4-bp A+T-biased specific target sites. Evidence of previous mobility is presented for the Wukong elements. Elements of these three families are associated with 7 of 16 fully or partially sequenced Ae. aegypti genes. Characteristics of these mosquito elements indicate strong similarities to the miniature inverted-repeat transposable elements (MITEs) recently found to be associated with plant genes. MITE-like elements have also been reported in two species of Xenopus and in Homo sapiens. This characterization of multiple families of highly repetitive MITE-like elements in an invertebrate extends the range of these elements in eukaryotic genomes. A hypothesis is presented relating genome size and organization to the presence of highly reiterated MITE families. The association of MITE-like elements with Ae. aegypti genes shows the same bias toward noncoding regions as in plants. This association has potentially important implications for the evolution of gene regulation.
Resumo:
Although most eukaryotic mRNAs need a functional cap binding complex eIF4F for efficient 5′ end- dependent scanning to initiate translation, picornaviral, hepatitis C viral, and a few cellular RNAs have been shown to be translated by internal ribosome entry, a mechanism that can operate in the presence of low levels of functional eIF4F. To identify cellular mRNAs that can be translated when eIF4F is depleted or in low abundance and that, therefore, may contain internal ribosome entry sites, mRNAs that remained associated with polysomes were isolated from human cells after infection with poliovirus and were identified by using a cDNA microarray. Approximately 200 of the 7000 mRNAs analyzed remained associated with polysomes under these conditions. Among the gene products encoded by these polysome-associated mRNAs were immediate-early transcription factors, kinases, and phosphatases of the mitogen-activated protein kinase pathways and several protooncogenes, including c-myc and Pim-1. In addition, the mRNA encoding Cyr61, a secreted factor that can promote angiogenesis and tumor growth, was selectively mobilized into polysomes when eIF4F concentrations were reduced, although its overall abundance changed only slightly. Subsequent tests confirmed the presence of internal ribosome entry sites in the 5′ noncoding regions of both Cyr61 and Pim-1 mRNAs. Overall, this study suggests that diverse mRNAs whose gene products have been implicated in a variety of stress responses, including inflammation, angiogenesis, and the response to serum, can use translational initiation mechanisms that require little or no intact cap binding protein complex eIF4F.
Resumo:
The alcohol dehydrogenase (Adh) gene family is much more complex in Pinus banksiana than in angiosperms, with at least seven expressed genes organized as two tightly linked clusters. Intron number and position are highly conserved between P. banksiana and angiosperms. Unlike angiosperm Adh genes, numerous duplications, as large as 217 bp, were observed within the noncoding regions of P. banksiana Adh genes and may be a common feature of conifer genes. A high frequency of duplication over a wide range of scales may contribute to the large genome size of conifers.
Resumo:
We have developed a system for generation of infectious bursal disease virus (IBDV), a segmented double-stranded RNA virus of the Birnaviridae family, with the use of synthetic transcripts derived from cloned cDNA. Independent full-length cDNA clones were constructed that contained the entire coding and noncoding regions of RNA segments A and B of two distinguishable IBDV strains of serotype I. Segment A encodes all of the structural (VP2, VP4, and VP3) and nonstructural (VP5) proteins, whereas segment B encodes the RNA-dependent RNA polymerase (VP1). Synthetic RNAs of both segments were produced by in vitro transcription of linearized plasmids with T7 RNA polymerase. Transfection of Vero cells with combined plus-sense transcripts of both segments generated infectious virus as early as 36 hr after transfection. The infectivity and specificity of the recovered chimeric virus was ascertained by the appearance of cytopathic effect in chicken embryo cells, by immunofluorescence staining of infected Vero cells with rabbit anti-IBDV serum, and by nucleotide sequence analysis of the recovered virus, respectively. In addition, transfectant viruses containing genetically tagged sequences in either segment A or segment B of IBDV were generated to confirm the feasibility of this system. The development of a reverse genetics system for double-stranded RNA viruses will greatly facilitate studies of the regulation of viral gene expression, pathogenesis, and design of a new generation of live vaccines.
Resumo:
Several recent reports indicate that mobile elements are frequently found in and flanking many wild-type plant genes. To determine the extent of this association, we performed computer-based systematic searches to identify mobile elements in the genes of two "model" plants, Oryza sativa (domesticated rice) and Arabidopsis thaliana. Whereas 32 common sequences belonging to nine putative mobile element families were found in the noncoding regions of rice genes, none were found in Arabidopsis genes. Five of the nine families (Gaijin, Castaway, Ditto, Wanderer, and Explorer) are first described in this report, while the other four were described previously (Tourist, Stowaway, p-SINE1, and Amy/LTP). Sequence similarity, structural similarity, and documentation of past mobility strongly suggests that many of the rice common sequences are bona fide mobile elements. Members of four of the new rice mobile element families are similar in some respects to members of the previously identified inverted-repeat element families, Tourist and Stowaway. Together these elements are the most prevalent type of transposons found in the rice genes surveyed and form a unique collection of inverted-repeat transposons we refer to as miniature inverted-repeat transposable elements or MITEs. The sequence and structure of MITEs are clearly distinct from short or long interspersed nuclear elements (SINEs or LINEs), the most common transposable elements associated with mammalian nuclear genes. Mobile elements, therefore, are associated with both animal and plant genes, but the identity of these elements is strikingly different.
Resumo:
In this report we show that yeast expressing brome mosaic virus (BMV) replication proteins 1a and 2a and replicating a BMV RNA3 derivative can be extracted to yield a template-dependent BMV RNA-dependent RNA polymerase (RdRp) able to synthesize (-)-strand RNA from BMV (+)-strand RNA templates added in vitro. This virus-specific yeast-derived RdRp mirrored the template selectivity and other characteristics of RdRp from BMV-infected plants. Equivalent extracts from yeast expressing 1a and 2a but lacking RNA3 contained normal amounts of 1a and 2a but had no RdRp activity on BMV RNAs added in vitro. To determine which RNA3 sequences were required in vivo to yield RdRp activity, we tested deletions throughout RNA3, including the 5',3', and intercistronic noncoding regions, which contain the cis-acting elements required for RNA3 replication in vivo. RdRp activity was obtained only from cells expressing 1a, 2a, and RNA3 derivatives retaining both 3' and intercistronic noncoding sequences. Strong correlation between extracted RdRp activity and BMV (-)-strand RNA accumulation in vivo was found for all RNA3 derivatives tested. Thus, extractable in vitro RdRp activity paralleled formation of a complex capable of viral RNA synthesis in vivo. The results suggest that assembly of active RdRp requires not only viral proteins but also viral RNA, either to directly contribute some nontemplate function or to recruit essential host factors into the RdRp complex and that sequences at both the 3'-terminal initiation site and distant internal sites of RNA3 templates may participate in RdRp assembly and initiation of (-)-strand synthesis.
Resumo:
To better understand the evolution of mitochondrial (mt) genomes in the Acari (mites and ticks), we sequenced the mt genome of the chigger mite, Leptotrombidium pallidum (Arthropoda: Acari: Acariformes). This genome is highly rearranged relative to that of the hypothetical ancestor of the arthropods and the other species of Acari studied. The mt genome of L. pallidum has two genes for large subunit rRNA, a pseudogene for small subunit rRNA, and four nearly identical large noncoding regions. Nineteen of the 22 tRNAs encoded by this genome apparently lack either a T-arm or a D-arm. Further, the mt genome of L. pallidum has two distantly separated sections with identical sequences but opposite orientations of transcription. This arrangement cannot be accounted for by homologous recombination or by previously known mechanisms of mt gene rearrangement. The most plausible explanation for the origin of this arrangement is illegitimate inter-mtDNA recombination, which has not been reported previously in animals. In light of the evidence from previous experiments on recombination in nuclear and mt genomes of animals, we propose a model of illegitimate inter-mtDNA recombination to account for the novel gene content and gene arrangement in the mt genome of L. pallidum.
Resumo:
In Mesoamerica, tropical dry forest is a highly threatened habitat, and species endemic to this environment are under extreme pressure. The tree species, Lonchocarpus costaricensis is endemic to the dry northwest of Costa Rica and southwest Nicaragua. It is a locally important species but, as land has been cleared for agriculture, populations have experienced considerable reduction and fragmentation. To assess current levels and distribution of genetic diversity in the species, a combination of chloroplast-specific (cpDNA) and whole genome DNA markers (amplified fragment length polymorphism, AFLP) were used to fingerprint 121 individual trees in 6 populations. Two cpDNA haplotypes were identified, distributed among populations such that populations at the extremes of the distribution showed lowest diversity. A large number (487) of AFLP markers were obtained and indicated that diversity levels were highest in the two coastal populations (Cobano, Matapalo, H = 0.23, 0.28 respectively). Population differentiation was low overall, F-ST = 0.12, although Matapalo was strongly differentiated from all other populations (F-ST = 0.16-0.22), apart from Cobano (F., = 0.11). Spatial genetic structure was present in both datasets at different scales: cpDNA was structured at a range-wide distribution scale, whilst AFLP data revealed genetic neighbourhoods on a population scale. In general, the habitat degradation of recent times appears not to have yet impacted diversity levels in mature populations. However, although no data on seed or saplings were collected, it seems likely that reproductive mechanisms in the species will have been affected by land clearance. It is recommended that efforts should be made to conserve the extant genetic resource base and further research undertaken to investigate diversity levels in the progeny generation.
Resumo:
Recently, we identified a large number of ultraconserved (uc) sequences in noncoding regions of human, mouse, and rat genomes that appear to be essential for vertebrate and amniote ontogeny. Here, we used similar methods to identify ultraconserved genomic regions between the insect species Drosophila melanogaster and Drosophila pseudoobscura, as well as the more distantly related Anopheles gambiae. As with vertebrates, ultraconserved sequences in insects appear to Occur primarily in intergenic and intronic sequences, and at intron-exon junctions. The sequences are significantly associated with genes encoding developmental regulators and transcription factors, but are less frequent and are smaller in size than in vertebrates. The longest identical, nongapped orthologous match between the three genomes was found within the homothorax (hth) gene. This sequence spans an internal exon-intron junction, with the majority located within the intron, and is predicted to form a highly stable stem-loop RNA structure. Real-time quantitative PCR analysis of different hth splice isoforms and Northern blotting showed that the conserved element is associated with a high incidence of intron retention in hth pre-mRNA, suggesting that the conserved intronic element is critically important in the post-transcriptional regulation of hth expression in Diptera.
Resumo:
An important topic in genomic sequence analysis is the identification of protein coding regions. In this context, several coding DNA model-independent methods based on the occurrence of specific patterns of nucleotides at coding regions have been proposed. Nonetheless, these methods have not been completely suitable due to their dependence on an empirically predefined window length required for a local analysis of a DNA region. We introduce a method based on a modified Gabor-wavelet transform (MGWT) for the identification of protein coding regions. This novel transform is tuned to analyze periodic signal components and presents the advantage of being independent of the window length. We compared the performance of the MGWT with other methods by using eukaryote data sets. The results show that MGWT outperforms all assessed model-independent methods with respect to identification accuracy. These results indicate that the source of at least part of the identification errors produced by the previous methods is the fixed working scale. The new method not only avoids this source of errors but also makes a tool available for detailed exploration of the nucleotide occurrence.
Resumo:
The oncogene GLI1 is involved in the formation of basal cell carcinoma and other tumor types as a result of the aberrant signaling of the Sonic hedgehog-Patched pathway. In this study, we have identified alternative GLI1 transcripts that differ in their 5' untranslated regions (UTRs) and are generated by exon skipping. These are denoted (alpha -UTR, beta -UTR, and gamma -UTR according to the number of noncoding exons possessed (three, two, and one, respectively). The alpha- and beta -UTR forms represent the major Gli1 transcripts expressed in mouse tissues, whereas the gamma -UTR is present at relatively low levels but is markedly induced in mouse skin treated with 12-O-tetradecanoylphorbol 13-acetate, Transcripts corresponding to the murine beta and gamma forms were identified in human tissues, but significantly, only the gamma -UTR form was present in basal cell carcinomas and in proliferating cultures of a keratinocyte cell line. Flow cytometry analysis determined that the gamma -UTR variant expresses a heterologous reporter gene 14-23-fold higher than the alpha -UTR and 5-13-fold higher than the beta -UTR in a variety of cell types. Because expression of the gamma -UTR variant correlates with proliferation, consistent with a role for GLI1 in growth promotion, up-regulation of GLI1 expression through skipping of 5' noncoding exons may be an important tumorigenic mechanism.
Resumo:
Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).
Resumo:
Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic–stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to ∼2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3′-UTRs. While we estimate a significant false discovery rate of ∼50%–70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).
Resumo:
This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.
Resumo:
This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.