943 resultados para RNA secondary structure
Resumo:
Skipping of internal exons during removal of introns from pre-mRNA must be avoided for proper expression of most eukaryotic genes. Despite significant understanding of the mechanics of intron removal, mechanisms that ensure inclusion of internal exons in multi-intron pre-mRNAs remain mysterious. Using a natural two-intron yeast gene, we have identified distinct RNA–RNA complementarities within each intron that prevent exon skipping and ensure inclusion of internal exons. We show that these complementarities are positioned to act as intron identity elements, bringing together only the appropriate 5′ splice sites and branchpoints. Destroying either intron self-complementarity allows exon skipping to occur, and restoring the complementarity using compensatory mutations rescues exon inclusion, indicating that the elements act through formation of RNA secondary structure. Introducing new pairing potential between regions near the 5′ splice site of intron 1 and the branchpoint of intron 2 dramatically enhances exon skipping. Similar elements identified in single intron yeast genes contribute to splicing efficiency. Our results illustrate how intron secondary structure serves to coordinate splice site pairing and enforce exon inclusion. We suggest that similar elements in vertebrate genes could assist in the splicing of very large introns and in the evolution of alternative splicing.
Resumo:
Signal recognition particle (SRP) is a stable cytoplasmic ribonucleoprotein complex that serves to translocate secretory proteins across membranes during translation. The SRP Database (SRPDB) provides compilations of SRP components, ordered alphabetically and phylogenetically. Alignments emphasize phylogenetically-supported base pairs in SRP RNA and conserved residues in the proteins. Data are provided in various formats including a column arrangement for improved access and simplified computational usability. Included are motifs for identification of new sequences, SRP RNA secondary structure diagrams, 3-D models and links to high-resolution structures. This release includes 11 new SRP RNA sequences (total of 129), two protein SRP9 sequences (total of seven), two protein SRP14 sequences (total of 10), two protein SRP19 sequences (total of 16), 10 new SRP54 (ffh) sequences (total of 66), two protein SRP68 sequences (total of seven) and two protein SRP72 sequences (total of nine). Seven sequences of the SRP receptor α-subunit and its FtsY homolog (total of 51) are new. Also considered are β-subunit of SRP receptor, Flhf, Hbsu, CaM kinase II and cpSRP43. Access to SRPDB is at http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html and the European mirror http://www.medkem.gu.se/dbs/SRPDB/SRPDB.html
Resumo:
RNA secondary structure folding algorithms predict the existence of connected networks of RNA sequences with identical structure. On such networks, evolving populations split into subpopulations, which diffuse independently in sequence space. This demands a distinction between two mutation thresholds: one at which genotypic information is lost and one at which phenotypic information is lost. In between, diffusion enables the search of vast areas in genotype space while still preserving the dominant phenotype. By this dynamic the success of phenotypic adaptation becomes much less sensitive to the initial conditions in genotype space.
Resumo:
Cystic fibrosis is caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene, which encodes a chloride channel present in many cells. In cardiomyocytes, we report that multiple exon 1 usage and alternative splicing produces four CFTR transcripts, with different 5'-untranslated regions, CFTRTRAD-139, CFTR-1C/-1A, CFTR-1C, and CFTR-1B. CFTR transcripts containing the novel upstream exons (exons -1C, -1B, and -1A) represent more than 90% of cardiac expressed CFTR mRNA. Regulation of cardiac CFTR expression, in response to developmental and pathological stimuli, is exclusively due to the modulation of CFTR-1C and CFTR-1C/-1A expression. Upstream open reading frames have been identified in the 5'-untranslated regions of all CFTR transcripts that, in conjunction with adjacent stem-loop structures, modulate the efficiency of translation initiation at the AUG codon of the main CFTR coding region in CFTRTRAD-139 and CFTR-1C/-1A transcripts. Exon(-1A), only present in CFTR-1C/-1A transcripts, encodes an AUG codon that is in-frame with the main CFTR open reading frame, the efficient translation of which produces a novel CFTR protein isoform with a curtailed amino terminus. As the expression of this CFTR transcript parallels the spatial and temporal distribution of the cAMP-activated whole-cell current density in normal and diseased hearts, we suggest that CFTR-1C/-1A provides the molecular basis for the cardiac cAMP-activated chloride channel. Our findings provide further insight into the complex nature of in vivo CFTR expression, to which multiple mRNA transcripts, protein isoforms, and post-transcriptional regulatory mechanisms are now added.
Resumo:
Ribozyme activity in vivo depends on achieving high-level expression, intracellular stability, target colocalization, and cleavage site access. At present, target site selection is problematic because of unforeseeable secondary and tertiary RNA structures that prevent cleavage. To overcome this design obstacle, we wished to engineer a ribozyme that could access any chosen site. To create this ribozyme, the constitutive transport element (CTE), an RNA motif that has the ability to interact with intracellular RNA helicases, was attached to our ribozymes so that the helicase-bound, hybrid ribozymes would be produced in cells. This modification significantly enhanced ribozyme activity in vivo, permitting cleavage of sites previously found to be inaccessible. To confer cleavage enhancement, the CTE must retain helicase-binding activity. Binding experiments demonstrated the likely involvement of RNA helicase(s). We found that attachment of the RNA motif to our tRNA ribozymes leads to cleavage in vivo at the chosen target site regardless of the local RNA secondary or tertiary structure.
Resumo:
Telomerase RNAs (TERs) are highly divergent between species, varying in size and sequence composition. Here, we identify a candidate for the telomerase RNA component of Leishmania genus, which includes species that cause leishmaniasis, a neglected tropical disease. Merging a thorough computational screening combined with RNA-seq evidence, we mapped a non-coding RNA gene localized in a syntenic locus on chromosome 25 of five Leishmania species that shares partial synteny with both Trypanosoma brucei TER locus and a putative TER candidate-containing locus of Crithidia fasciculata. Using target-driven molecular biology approaches, we detected a ∼2,100 nt transcript (LeishTER) that contains a 5' spliced leader (SL) cap, a putative 3' polyA tail and a predicted C/D box snoRNA domain. LeishTER is expressed at similar levels in the logarithmic and stationary growth phases of promastigote forms. A 5'SL capped LeishTER co-immunoprecipitated and co-localized with the telomerase protein component (TERT) in a cell cycle-dependent manner. Prediction of its secondary structure strongly suggests the existence of a bona fide single-stranded template sequence and a conserved C[U/C]GUCA motif-containing helix II, representing the template boundary element. This study paves the way for further investigations on the biogenesis of parasite TERT ribonucleoproteins (RNPs) and its role in parasite telomere biology.
Resumo:
Dans un premier temps, nous avons modélisé la structure d’une famille d’ARN avec une grammaire de graphes afin d’identifier les séquences qui en font partie. Plusieurs autres méthodes de modélisation ont été développées, telles que des grammaires stochastiques hors-contexte, des modèles de covariance, des profils de structures secondaires et des réseaux de contraintes. Ces méthodes de modélisation se basent sur la structure secondaire classique comparativement à nos grammaires de graphes qui se basent sur les motifs cycliques de nucléotides. Pour exemplifier notre modèle, nous avons utilisé la boucle E du ribosome qui contient le motif Sarcin-Ricin qui a été largement étudié depuis sa découverte par cristallographie aux rayons X au début des années 90. Nous avons construit une grammaire de graphes pour la structure du motif Sarcin-Ricin et avons dérivé toutes les séquences qui peuvent s’y replier. La pertinence biologique de ces séquences a été confirmée par une comparaison des séquences d’un alignement de plus de 800 séquences ribosomiques bactériennes. Cette comparaison a soulevée des alignements alternatifs pour quelques unes des séquences que nous avons supportés par des prédictions de structures secondaires et tertiaires. Les motifs cycliques de nucléotides ont été observés par les membres de notre laboratoire dans l'ARN dont la structure tertiaire a été résolue expérimentalement. Une étude des séquences et des structures tertiaires de chaque cycle composant la structure du Sarcin-Ricin a révélé que l'espace des séquences dépend grandement des interactions entre tous les nucléotides à proximité dans l’espace tridimensionnel, c’est-à-dire pas uniquement entre deux paires de bases adjacentes. Le nombre de séquences générées par la grammaire de graphes est plus petit que ceux des méthodes basées sur la structure secondaire classique. Cela suggère l’importance du contexte pour la relation entre la séquence et la structure, d’où l’utilisation d’une grammaire de graphes contextuelle plus expressive que les grammaires hors-contexte. Les grammaires de graphes que nous avons développées ne tiennent compte que de la structure tertiaire et négligent les interactions de groupes chimiques spécifiques avec des éléments extra-moléculaires, comme d’autres macromolécules ou ligands. Dans un deuxième temps et pour tenir compte de ces interactions, nous avons développé un modèle qui tient compte de la position des groupes chimiques à la surface des structures tertiaires. L’hypothèse étant que les groupes chimiques à des positions conservées dans des séquences prédéterminées actives, qui sont déplacés dans des séquences inactives pour une fonction précise, ont de plus grandes chances d’être impliqués dans des interactions avec des facteurs. En poursuivant avec l’exemple de la boucle E, nous avons cherché les groupes de cette boucle qui pourraient être impliqués dans des interactions avec des facteurs d'élongation. Une fois les groupes identifiés, on peut prédire par modélisation tridimensionnelle les séquences qui positionnent correctement ces groupes dans leurs structures tertiaires. Il existe quelques modèles pour adresser ce problème, telles que des descripteurs de molécules, des matrices d’adjacences de nucléotides et ceux basé sur la thermodynamique. Cependant, tous ces modèles utilisent une représentation trop simplifiée de la structure d’ARN, ce qui limite leur applicabilité. Nous avons appliqué notre modèle sur les structures tertiaires d’un ensemble de variants d’une séquence d’une instance du Sarcin-Ricin d’un ribosome bactérien. L’équipe de Wool à l’université de Chicago a déjà étudié cette instance expérimentalement en testant la viabilité de 12 variants. Ils ont déterminé 4 variants viables et 8 létaux. Nous avons utilisé cet ensemble de 12 séquences pour l’entraînement de notre modèle et nous avons déterminé un ensemble de propriétés essentielles à leur fonction biologique. Pour chaque variant de l’ensemble d’entraînement nous avons construit des modèles de structures tertiaires. Nous avons ensuite mesuré les charges partielles des atomes exposés sur la surface et encodé cette information dans des vecteurs. Nous avons utilisé l’analyse des composantes principales pour transformer les vecteurs en un ensemble de variables non corrélées, qu’on appelle les composantes principales. En utilisant la distance Euclidienne pondérée et l’algorithme du plus proche voisin, nous avons appliqué la technique du « Leave-One-Out Cross-Validation » pour choisir les meilleurs paramètres pour prédire l’activité d’une nouvelle séquence en la faisant correspondre à ces composantes principales. Finalement, nous avons confirmé le pouvoir prédictif du modèle à l’aide d’un nouvel ensemble de 8 variants dont la viabilité à été vérifiée expérimentalement dans notre laboratoire. En conclusion, les grammaires de graphes permettent de modéliser la relation entre la séquence et la structure d’un élément structural d’ARN, comme la boucle E contenant le motif Sarcin-Ricin du ribosome. Les applications vont de la correction à l’aide à l'alignement de séquences jusqu’au design de séquences ayant une structure prédéterminée. Nous avons également développé un modèle pour tenir compte des interactions spécifiques liées à une fonction biologique donnée, soit avec des facteurs environnants. Notre modèle est basé sur la conservation de l'exposition des groupes chimiques qui sont impliqués dans ces interactions. Ce modèle nous a permis de prédire l’activité biologique d’un ensemble de variants de la boucle E du ribosome qui se lie à des facteurs d'élongation.
Resumo:
Calyptommatus and Nothobachia genera of gymnophthalmid lizards are restricted to sandy open habitats on Sao Francisco River margins, northeastern Brazil. Phylogenetic relationships and geographic distribution of the four recognized species of Calyptommatus were analyzed from partial mitochondrial cyt b, 12S, and 16S rRNA genes sequencing, taking allopatric populations of the monotypic Nothobachia ablephara as the outgroup. In Calyptommatus a basal split separated C. sinebrachiatus, a species restricted to the eastern bank of the river, from the three other species. In this clade, C. confusionibus, found on western margin, was recovered as the sister group of the two other species, C. leiolepis and C. nicterus, from opposite margins. According to approximate date estimations, C. sinebrachiatus would have separated from the other congeneric species by 4.4-6.5 my, and C. nicterus, also from eastern bank, would be diverging by 1.8-2.6 my from C. leiolepis, the sister species on the opposite margin. C. confusionibus and C. leiolepis, both from western sandy areas, would be differentiating by 2.8-5.0 my. Divergence times of about 3.0-4.0 my were estimated for allopatric populations of Nothobachia restricted to western margin. Significant differences in 16S rRNA secondary structure relatively to other vertebrates are reported. Distinct evolutionary patterns are proposed for different taxa in those sandy areas, probably related to historical changes in the course of Sao Francisco River. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Chemists have long sought to extrapolate the power of biological catalysis and recognition to synthetic systems. These efforts have focused largely on low molecular weight catalysts and receptors; however, biological systems themselves rely almost exclusively on polymers, proteins and RNA, to perform complex chemical functions. Proteins and RNA are unique in their ability to adopt compact, well-ordered conformations, and specific folding provides precise spatial orientation of the functional groups that comprise the “active site”. These features suggest that identification of new polymer backbones with discrete and predictable folding propensities (“foldamers”) will provide a basis for design of molecular machines with unique capabilities. The foldamer approach complements current efforts to design unnatural properties into polypeptides and polynucleotides. The aim of this thesis is the synthesis and conformational studies of new classes of foldamers, using a peptidomimetic approach. Moreover their attitude to be utilized as ionophores, catalysts, and nanobiomaterials were analyzed in solution and in the solid state. This thesis is divided in thematically chapters that are reported below. It begins with a very general introduction (page 4) which is useful, but not strictly necessary, to the expert reader. It is worth mentioning that paragraph I.3 (page 22) is the starting point of this work and paragraph I.5 (page 32) isrequired to better understand the results of chapters 4 and 5. In chapter 1 (page 39) is reported the synthesis and conformational analysis of a novel class of foldamers containing (S)-β3-homophenylglycine [(S)-β3-hPhg] and D- 4-carboxy-oxazolidin-2-one (D-Oxd) residues in alternate order is reported. The experimental conformational analysis performed in solution by IR, 1HNMR, and CD spectroscopy unambiguously proved that these oligomers fold into ordered structures with increasing sequence length. Theoretical calculations employing ab initio MO theory suggest a helix with 11-membered hydrogenbonded rings as the preferred secondary structure type. The novel structures enrich the field of peptidic foldamers and might be useful in the mimicry of native peptides. In chapter 2 cyclo-(L-Ala-D-Oxd)3 and cyclo-(L-Ala-DOxd) 4 were prepared in the liquid phase with good overall yields and were utilized for bivalent ions chelation (Ca2+, Mg2+, Cu2+, Zn2+ and Hg2+); their chelation skill was analyzed with ESI-MS, CD and 1HNMR techniques and the best results were obtained with cyclo-(L-Ala-D-Oxd)3 and Mg2+ or Ca2+. Chapter 3 describes an application of oligopeptides as catalysts for aldol reactions. Paragraph 3.1 concerns the use of prolinamides as catalysts of the cross aldol addition of hydroxyacetone to aromatic aldeydes, whereas paragraphs 3.2 and 3.3 are about the catalyzed aldol addition of acetone to isatins. By means of DFT and AIM calculations, the steric and stereoelectronic effects that control the enantioselectivity in the cross-aldol addition of acetone to isatin catalysed by L-proline have been studied, also in the presence of small quantities of water. In chapter 4 is reported the synthesis and the analysis of a new fiber-like material, obtained from the selfaggregation of the dipeptide Boc-L-Phe-D-Oxd-OBn, which spontaneously forms uniform fibers consisting of parallel infinite linear chains arising from singleintermolecular N-H···O=C hydrogen bonds. This is the absolute borderline case of a parallel β-sheet structure. Longer oligomers of the same series with general formula Boc-(L-Phe-D-Oxd)n-OBn (where n = 2-5), are described in chapter 5. Their properties in solution and in the solid state were analyzed, in correlation with their attitude to form intramolecular hydrogen bond. In chapter 6 is reported the synthesis of imidazolidin-2- one-4-carboxylate and (tetrahydro)-pyrimidin-2-one-5- carboxylate, via an efficient modification of the Hofmann rearrangement. The reaction affords the desired compounds from protected asparagine or glutamine in good to high yield, using PhI(OAc)2 as source of iodine(III).
Resumo:
In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.
Resumo:
In populations that are small and asexual, mutations with slight negative effects on fitness will drift to fixation more often than in large or sexual populations in which they will be eliminated by selection. If such mutations occur in substantial numbers, the combined effects of long-term asexuality and small population size may result in substantial accumulation of mildly deleterious substitutions. Prokaryotic endosymbionts of animals that are transmitted maternally for very long periods are effectively asexual and experience smaller effective population size than their free-living relatives. The contrast between such endosymbionts and related free-living bacteria allows us to test whether a population structure imposing frequent bottlenecks and asexuality does lead to an accumulation of slightly deleterious substitutions. Here we show that several independently derived insect endosymbionts, each with a long history of maternal transmission, have accumulated destabilizing base substitutions in the highly conserved 16S rRNA. Stabilities of Domain I of this subunit are 15–25% lower in endosymbionts than in closely related free-living bacteria. By mapping destabilizing substitutions onto a reconstructed phylogeny, we show that decreased ribosomal stability has evolved separately in each endosymbiont lineage. Our phylogenetic approach allows us to demonstrate statistical significance for this pattern: becoming endosymbiotic predictably results in decreased stability of rRNA secondary structure.
Resumo:
The overall folded (global) structure of mRNA may be critical to translation and turnover control mechanisms, but it has received little experimental attention. Presented here is a comparative analysis of the basic features of the global secondary structure of a synthetic mRNA and the same intracellular eukaryotic mRNA by dimethyl sulfate (DMS) structure probing. Synthetic MFA2 mRNA of Saccharomyces cerevisiae first was examined by using both enzymes and chemical reagents to determine single-stranded and hybridized regions; RNAs with and without a poly(A) tail were compared. A folding pattern was obtained with the aid of the mfold program package that identified the model that best satisfied the probing data. A long-range structural interaction involving the 5′ and 3′ untranslated regions and causing a juxtaposition of the ends of the RNA, was examined further by a useful technique involving oligo(dT)-cellulose chromatography and antisense oligonucleotides. DMS chemical probing of A and C nucleotides of intracellular MFA2 mRNA was then done. The modification data support a very similar intracellular structure. When low reactivity of A and C residues is found in the synthetic RNA, ≈70% of the same sites are relatively more resistant to DMS modification in vivo. A slightly higher sensitivity to DMS is found in vivo for some of the A and C nucleotides predicted to be hybridized from the synthetic structural model. With this small mRNA, the translation process and mRNA-binding proteins do not block DMS modifications, and all A and C nucleotides are modified the same or more strongly than with the synthetic RNA.
Resumo:
Several models have been proposed for the mechanism of transcript termination by Escherichia coli RNA polymerase at rho-independent terminators. Yager and von Hippel (Yager, T. D. & von Hippel, P. H. (1991) Biochemistry 30, 1097–118) postulated that the transcription complex is stabilized by enzyme–nucleic acid interactions and the favorable free energy of a 12-bp RNA–DNA hybrid but is destabilized by the free energy required to maintain an extended transcription bubble. Termination, by their model, is viewed simply as displacement of the RNA transcript from the hybrid helix by reformation of the DNA helix. We have proposed an alternative model where the RNA transcript is stably bound to RNA polymerase primarily through interactions with two single-strand specific RNA-binding sites; termination is triggered by formation of an RNA hairpin that reduces binding of the RNA to one RNA-binding site and, ultimately, leads to its ejection from the complex. To distinguish between these models, we have tested whether E. coli RNA polymerase can terminate transcription at rho-independent terminators on single-stranded DNA. RNA polymerase cannot form a transcription bubble on these templates; thus, the Yager–von Hippel model predicts that intrinsic termination will not occur. We find that transcript elongation on single-stranded DNA templates is hindered somewhat by DNA secondary structure. However, E. coli RNA polymerase efficiently terminates and releases transcripts at several rho-independent terminators on such templates at the same positions as termination occurs on duplex DNAs. Therefore, neither the nontranscribed DNA strand nor the transcription bubble is essential for rho-independent termination by E. coli RNA polymerase.