976 resultados para Sequence Features
Resumo:
Abstract Background One of the least common types of alternative splicing is the complete retention of an intron in a mature transcript. Intron retention (IR) is believed to be the result of intron, rather than exon, definition associated with failure of the recognition of weak splice sites flanking short introns. Although studies on individual retained introns have been published, few systematic surveys of large amounts of data have been conducted on the mechanisms that lead to IR. Results TTo understand how sequence features are associated with or control IR, and to produce a generalized model that could reveal previously unknown signals that regulate this type of alternative splicing, we partitioned intron retention events observed in human cDNAs into two groups based on the relative abundance of both isoforms and compared relevant features. We found that a higher frequency of IR in human is associated with individual introns that have weaker splice sites, genes with shorter intron lengths, higher expression levels and lower density of both a set of exon splicing silencers (ESSs) and the intronic splicing enhancer GGG. Both groups of retained introns presented events conserved in mouse, in which the retained introns were also short and presented weaker splice sites. Conclusion Although our results confirmed that weaker splice sites are associated with IR, they showed that this feature alone cannot explain a non-negligible fraction of events. Our analysis suggests that cis-regulatory elements are likely to play a crucial role in regulating IR and also reveals previously unknown features that seem to influence its occurrence. These results highlight the importance of considering the interplay among these features in the regulation of the relative frequency of IR.
Resumo:
Background: In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results: We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion: The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms.
Resumo:
Ribosome profiling (Ribo-seq), a promising technology for exploring ribosome decoding rates, is characterized by the presence of infrequent high peaks in ribosome footprint density and by long alignment gaps. Here, to reduce the impact of data heterogeneity we introduce a simple normalization method, Ribo-seq Unit Step Transformation (RUST). RUST is robust and outperforms other normalization techniques in the presence of heterogeneous noise. We illustrate how RUST can be used for identifying mRNA sequence features that affect ribosome footprint densities globally. We show that a few parameters extracted with RUST are sufficient for predicting experimental densities with high accuracy. Importantly the application of RUST to 30 publicly available Ribo-seq data sets revealed a substantial variation in sequence determinants of ribosome footprint frequencies, questioning the reliability of Ribo-seq as an accurate representation of local ribosome densities without prior quality control. This emphasizes our incomplete understanding of how protocol parameters affect ribosome footprint densities.
Resumo:
Signal peptides and transmembrane helices both contain a stretch of hydrophobic amino acids. This common feature makes it difficult for signal peptide and transmembrane helix predictors to correctly assign identity to stretches of hydrophobic residues near the N-terminal methionine of a protein sequence. The inability to reliably distinguish between N-terminal transmembrane helix and signal peptide is an error with serious consequences for the prediction of protein secretory status or transmembrane topology. In this study, we report a new method for differentiating protein N-terminal signal peptides and transmembrane helices. Based on the sequence features extracted from hydrophobic regions (amino acid frequency, hydrophobicity, and the start position), we set up discriminant functions and examined them on non-redundant datasets with jackknife tests. This method can incorporate other signal peptide prediction methods and achieve higher prediction accuracy. For Gram-negative bacterial proteins, 95.7% of N-terminal signal peptides and transmembrane helices can be correctly predicted (coefficient 0.90). Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 99% (coefficient 0.92). For eukaryotic proteins, 94.2% of N-terminal signal peptides and transmembrane helices can be correctly predicted with coefficient 0.83. Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 87% (coefficient 0.85). The method can be used to complement current transmembrane protein prediction and signal peptide prediction methods to improve their prediction accuracies. (C) 2003 Elsevier Inc. All rights reserved.
Resumo:
BACKGROUND: In mammals, ChIP-seq studies of RNA polymerase II (PolII) occupancy have been performed to reveal how recruitment, initiation and pausing of PolII may control transcription rates, but the focus is rarely on obtaining finely resolved profiles that can portray the progression of PolII through sequential promoter states. RESULTS: Here, we analyze PolII binding profiles from high-coverage ChIP-seq on promoters of actively transcribed genes in mouse and humans. We show that the enrichment of PolII near transcription start sites exhibits a stereotypical bimodal structure, with one peak near active transcription start sites and a second peak 110 base pairs downstream from the first. Using an empirical model that reliably quantifies the spatial PolII signal, gene by gene, we show that the first PolII peak allows for refined positioning of transcription start sites, which is corroborated by mRNA sequencing. This bimodal signature is found both in mouse and humans. Analysis of the pausing-related factors NELF and DSIF suggests that the downstream peak reflects widespread pausing at the +1 nucleosome barrier. Several features of the bimodal pattern are correlated with sequence features such as CpG content and TATA boxes, as well as the histone mark H3K4me3. CONCLUSIONS: We thus show how high coverage DNA sequencing experiments can reveal as-yet unnoticed bimodal spatial features of PolII accumulation that are frequent at individual mammalian genes and reminiscent of transcription initiation and pausing. The initiation-pausing hypothesis is corroborated by evidence from run-on sequencing and immunoprecipitation in other cell types and species.
Resumo:
BACKGROUND: The vast majority of the 1.1 million Alu elements are retrotranspositionally inactive, where only a few loci referred to as 'source elements' can generate new Alu insertions. The first step in identifying the active Alu sources is to determine the loci transcribed by RNA polymerase III (pol III). Previous genome-wide analyses from normal and transformed cell lines identified multiple Alu loci occupied by pol III factors, making them candidate source elements. FINDINGS: Analysis of the data from these genome-wide studies determined that the majority of pol III-bound Alus belonged to the older subfamilies Alu S and Alu J, which varied between cell lines from 62.5% to 98.7% of the identified loci. The pol III-bound Alus were further scored for estimated retrotransposition potential (ERP) based on the absence or presence of selected sequence features associated with Alu retrotransposition capability. Our analyses indicate that most of the pol III-bound Alu loci candidates identified lack the sequence characteristics important for retrotransposition. CONCLUSIONS: These data suggest that Alu expression likely varies by cell type, growth conditions and transformation state. This variation could extend to where the same cell lines in different laboratories present different Alu expression patterns. The vast majority of Alu loci potentially transcribed by RNA pol III lack important sequence features for retrotransposition and the majority of potentially active Alu loci in the genome (scored high ERP) belong to young Alu subfamilies. Our observations suggest that in an in vivo scenario, the contribution of Alu activity on somatic genetic damage may significantly vary between individuals and tissues.
Resumo:
Une stratégie de synthèse efficace de différents composés de type azabicyclo[X.Y.0]alkanone fonctionnalisés a été développée. La stratégie synthétique implique la préparation de dipeptides par couplage avec des motifs vinyl-, allyl-, homoallyl- et homohomoallylglycine suivi d’une réaction de fermeture de cycle par métathèse permettant d’obtenir des lactames macrocycliques de 8, 9 et 10 membres, qui subissent une iodolactamisation transannulaire menant à l’obtention de mimes peptidiques bicycliques portant un groupement iode. Des couplages croisés catalysés par des métaux de transition ont été développés pour la synthèse d’acides aminés ω-insaturés énantiomériquement purs à partir de l’iodoanaline. L’étude du mécanisme suggère que l’iodure subit une attaque du coté le moins stériquement encombré de la lactame macrocyclique insaturée pour mener à l’obtention d’un intermédiaire iodonium. La cyclisation se produit ensuite par une route minimisant les interactions diaxiales et la tension allylique. L’iodolactamisation des différentes lactames macrocycliques insaturées a mené à l’obtention regio- et diastéréosélective d’acides aminés 5,5- et 6,6-iodobicycicliques. De plus, une imidate azabicyclo[4.3.1]alkane pontée de type anti-Bredt fut synthétisée à partir d’une lactame macrocyclique insaturé à neuf membres. Les analyses cristallographiques et spectroscopiques des macrocycles à 8, 9 et 10 membres, du composé iodobicyclique 5,5 ainsi que de l’imidate pontée, montrent bien le potentiel de ces dipeptides rigidifiés de servir en tant que mimes des résidus centraux de tours β de type I, II’, II et VI.
Resumo:
Objective: The study aims to investigate a possible correlation between the main clinical and ophthalmological characteristics, age and Robin sequence in patients with the Stickler syndrome. Introduction: The Stickler syndrome is an autosomal dominant genetic disorder, characterised by ocular, orofacial and skeletal anomalies and/or auditory loss. Patients with Robin sequence features and respiratory complications are frequently diagnosed with the Stickler syndrome. The heterogeneous phenotypic manifestations may present a challenge for early clinical diagnosis. Methods: We performed a retrospective study of the 98 patients with the Stickler syndrome, between November 1995 and June 2009. The data were compared to investigate their ocular alterations and association with the Robin sequence. To be included, patients had to present with the following triad: cleft palate, facial features (hypoplastic midface, micrognathia and prominent eyes) and ocular anomalies (myopia and/or abnormalities of the retina). Results: Fifty-one percent of the patients presenting with Robin sequence features had been diagnosed with the Stickler syndrome. Ocular alterations were found in 50% of the patients. Discussion: The Robin sequence may appear as an isolated condition or associated with other features, or else as part of other known syndromes. Currently, the diagnosis of the Stickler syndrome is based on clinical signs. Affected individuals eventually develop hearing loss, retinal detachment and blindness. The ophthalmological complications associated are usually progressive and can lead to blindness.
Resumo:
Regulation of cytoplasmic deadenylation, the first step in mRNA turnover, has direct impact on the fate of gene expression. AU-rich elements (AREs) found in the 3′ untranslated regions of many labile mRNAs are the most common RNA-destabilizing elements known in mammalian cells. Based on their sequence features and functional properties, AREs can be divided into three classes. Class I or class III ARE directs synchronous deadenylation, whereas class II ARE directs asynchronous deadenylation with the formation of poly(A)-intermediates. Through systematic mutagenesis study, we found that a cluster of five or six copies of AUUUA motifs forming various degrees of reiteration is the key feature dictating the choice between asynchronous versus synchronous deadenylation. A 20–30 nt AU-rich sequence immediately 5 ′ to this cluster of AUUUA motifs can greatly enhance its destabilizing ability and is an integral part of the AREs. These two features are the defining characteristics of class II AREs. ^ To better understand the decay mechanism of AREs, current methods have several limitations. Taking the advantage of tetracycline-regulated promoter, we developed a new transcriptional pulse strategy, Tet-system. By controlling the time and the amount of Tet addition, a pulse of RNA could be generated. Using this new system, we showed that AREs function in both growth- and density-arrested cells. The new strategy offers for the first time an opportunity to investigate control of mRNA deadenylation and decay kinetics in mammalian cells that exhibit physiologically relevant conditions. ^ As a member of heterogeneous nuclear RNA-binding protein, hnRNP D 0/AUF1 displays specific affinities for ARE sequences in vitro . But its in vivo function in ARE-mediated mRNA decay is unclear. AUF1/hnRNP D0 is composed of at least four isoforms derived by alternative RNA splicing. Each isoform exhibits different affinity for ARE sequence in vitro. Here, we examined in vivo effect of AUF1s/hnRNP D0s on degradation of ARE-containing mRNA. Our results showed that all four isoforms exhibit various RNA stabilizing effects in NIH3T3 cells, which are positively correlated with their binding affinities for ARE sequences. Further experiments indicated that AUF1/hnRNP D0 has a general role in modulating the stability of cytoplasmic mRNAs in mammalian cells. ^
Resumo:
ORF slr0798, now designated ziaA, from Synechocystis PCC 6803 encodes a polypeptide with sequence features of heavy metal transporting P-type ATPases. Increased Zn2+ tolerance and reduced 65Zn accumulation was observed in Synechococcus PCC 7942, strain R2-PIM8(smt), containing ziaA and upstream regulatory sequences, compared with control cells. Conversely, reduced Zn2+ tolerance was observed following disruption of ziaA in Synechocystis PCC 6803, and ziaA-mediated restoration of Zn2+ tolerance has subsequently been used as a selectable marker for transformation. Nucleotide sequences upstream of ziaA, fused to a promoterless lacZ gene, conferred Zn2+-dependent β-galactosidase activity when introduced into R2-PIM8(smt). The product of ORF sll0792, designated ZiaR, is a Zn2+-responsive repressor of ziaA transcription. Reporter gene constructs lacking ziaR conferred elevated Zn2+-independent expression from the ziaA operator–promoter in R2-PIM8(smt). Gel retardation assays detected ZiaR-dependent complexes forming with the zia operator–promoter and ZiaR–DNA binding was enhanced by treatment with a metal-chelator in vitro. Two mutants of ZiaR (C71S/C73S and H116R) bound to, and repressed expression from, the ziaA operator–promoter but were unable to sense Zn2+. Metal coordination to His-imidazole and Cys-thiolate ligands at these residues of ZiaR is thus implicated in Zn2+-perception by Synechocystis PCC 6803.
Resumo:
The database, called HyPaLib (for Hybrid Pattern Library), contains annotated structural elements characteristic for certain classes of structural and/or functional RNAs. These elements are described in a language specifically designed for this purpose. The language allows convenient specification of hybrid patterns, i.e. motifs consisting of sequence features and structural elements together with sequence similarity and thermodynamic constraints. We are currently developing software tools that allow a user to search sequence databases for any pattern in HyPaLib, thus providing functionality which is similar to PROSITE, but dedicated to the more complex patterns in RNA sequences. HyPaLib is available at http://bibiserv.techfak.uni-bielefeld.de/HyPa/.
Resumo:
The M78 protein of murine cytomegalovirus exhibits sequence features of a G protein-coupled receptor. It is synthesized with early kinetics, it becomes partially colocalized with Golgi markers, and it is incorporated into viral particles. We have constructed a viral substitution mutant, SMsubM78, which lacks most of the M78 ORF. The mutant produces a reduced yield in cultured 10.1 fibroblast and IC21 macrophage cell lines. The defect is multiplicity dependent and greater in the macrophage cell line. Consistent with its growth defect in cultured cells, the mutant exhibits reduced pathogenicity in mice, generating less infectious progeny than wild-type virus in all organs assayed. SMsubM78 fails to efficiently activate accumulation of the viral m123 immediate-early mRNA in infected macrophages. M78 facilitates the accumulation of the immediate-early mRNA in cycloheximide-treated cells, arguing that it acts in the absence of de novo protein synthesis. We conclude that the M78 G protein-coupled receptor homologue is delivered to cells as a constituent of the virion, and it acts to facilitate the accumulation of immediate-early mRNA.
Resumo:
Several human neurological disorders are associated with proteins containing abnormally long runs of glutamine residues. Strikingly, most of these proteins contain two or more additional long runs of amino acids other than glutamine. We screened the current human, mouse, Drosophila, yeast, and Escherichia coli protein sequence data bases and identified all proteins containing multiple long homopeptides. This search found multiple long homopeptides in about 12% of Drosophila proteins but in only about 1.7% of human, mouse, and yeast proteins and none among E. coli proteins. Most of these sequences show other unusual sequence features, including multiple charge clusters and excessive counts of homopeptides of length > or = two amino acid residues. Intriguingly, a large majority of the identified Drosophila proteins are essential developmental proteins and, in particular, most play a role in central nervous system development. Almost half of the human and mouse proteins identified are homeotic homologs. The role of long homopeptides in fine-tuning protein conformation for multiple functional activities is discussed. The relative contributions of strand slippage and of dynamic mutation are also addressed. Several new experiments are proposed.
Resumo:
The thiol tripeptides, glutathione (GSH) and homoglutathione (hGSH), perform multiple roles in legumes, including protection against toxicity of free radicals and heavy metals. The three genes involved in the synthesis of GSH and hGSH in the model legume, Lotus japonicus, have been fully characterized and appear to be present as single copies in the genome. The gamma-glutamylcysteine synthetase (gammaecs) gene was mapped on the long arm of chromosome 4 (70.0 centimorgans [cM]) and consists of 15 exons, whereas the glutathione synthetase (gshs) and homoglutathione synthetase (hgshs) genes were mapped on the long arm of chromosome 1 (81.3 cM) and found to be arranged in tandem, with a separation of approximately 8 kb. Both genes consist of 12 exons of exactly the same size (except exon 1, which is similar). Two types of transcripts were detected for the gshs gene, which putatively encode proteins localized in the plastids and cytosol. Promoter regions contain cis-acting regulatory elements that may be involved in the plant's response to light, hormones, and stress. Determination of transcript levels, enzyme activities, and thiol contents in nodules, roots, and leaves revealed that gammaecs and hgshs are expressed in all three plant organs, whereas gshs is significantly functional only in nodules. This strongly suggests an important role of GSH in the rhizobia-legume symbiosis.