917 resultados para Peptide secondary structure
Resumo:
Background The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Resumo:
miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star
Resumo:
Potato leafroll virus (PLRV) is a positive-strand RNA virus that generates subgenomic RNAs (sgRNA) for expression of 3' proximal genes. Small RNA (sRNA) sequencing and mapping of the PLRV-derived sRNAs revealed coverage of the entire viral genome with the exception of four distinctive gaps. Remarkably, these gaps mapped to areas of PLRV genome with extensive secondary structures, such as the internal ribosome entry site and 5' transcriptional start site of sgRNA1 and sgRNA2. The last gap mapped to ~500. nt from the 3' terminus of PLRV genome and suggested the possible presence of an additional sgRNA for PLRV. Quantitative real-time PCR and northern blot analysis confirmed the expression of sgRNA3 and subsequent analyses placed its 5' transcriptional start site at position 5347 of PLRV genome. A regulatory role is proposed for the PLRV sgRNA3 as it encodes for an RNA-binding protein with specificity to the 5' of PLRV genomic RNA. © 2013.
Resumo:
A library containing approximately 40,000 small RNA sequences was constructed for Brassica napus. Analysis of 3025 sequences obtained from this library resulted in the identification of 11 conserved miRNA families, which were validated by secondary structure prediction using surrounding sequences in the Brassica genome. Two 21 nt small RNA sequences reside within the arm of a pre-miRNA like stem-loop structure, making them likely candidates for novel non-conserved miRNAs in B. napus. Most of the conserved miRNAs were expressed at similar levels in a F1 hybrid B. napus line and its four double haploid progeny that showed marked variations in phenotypes, but many were differentially expressed between B. napus and Arabidopsis. The miR169 family was expressed at high levels in young leaves and stems, but was undetectable in roots and mature leaves, suggesting that miR169 expression is developmentally regulated in B. napus. © 2007 Federation of European Biochemical Societies.
Resumo:
In this paper, the complete mitochondrial genome of Acraea issoria (Lepidoptera: Nymphalidae: Heliconiinae: Acraeini) is reported; a circular molecule of 15,245 bp in size. For A. issoria, genes are arranged in the same order and orientation as the complete sequenced mitochondrial genomes of the other lepidopteran species, except for the presence of an extra copy of tRNAIle(AUR)b in the control region. All protein-coding genes of A. issoria mitogenome start with a typical ATN codon and terminate in the common stop codon TAA, except that COI gene uses TTG as its initial codon and terminates in a single T residue. All tRNA genes possess the typical clover leaf secondary structure except for tRNASer(AGN), which has a simple loop with the absence of the DHU stem. The sequence, organization and other features including nucleotide composition and codon usage of this mitochondrial genome were also reported and compared with those of other sequenced lepidopterans mitochondrial genomes. There are some short microsatellite-like repeat regions (e.g., (TA)9, polyA and polyT) scattered in the control region, however, the conspicuous macro-repeats units commonly found in other insect species are absent.
Resumo:
Background Small RNA sequencing is commonly used to identify novel miRNAs and to determine their expression levels in plants. There are several miRNA identification tools for animals such as miRDeep, miRDeep2 and miRDeep*. miRDeep-P was developed to identify plant miRNA using miRDeep’s probabilistic model of miRNA biogenesis, but it depends on several third party tools and lacks a user-friendly interface. The objective of our miRPlant program is to predict novel plant miRNA, while providing a user-friendly interface with improved accuracy of prediction. Result We have developed a user-friendly plant miRNA prediction tool called miRPlant. We show using 16 plant miRNA datasets from four different plant species that miRPlant has at least a 10% improvement in accuracy compared to miRDeep-P, which is the most popular plant miRNA prediction tool. Furthermore, miRPlant uses a Graphical User Interface for data input and output, and identified miRNA are shown with all RNAseq reads in a hairpin diagram. Conclusions We have developed miRPlant which extends miRDeep* to various plant species by adopting suitable strategies to identify hairpin excision regions and hairpin structure filtering for plants. miRPlant does not require any third party tools such as mapping or RNA secondary structure prediction tools. miRPlant is also the first plant miRNA prediction tool that dynamically plots miRNA hairpin structure with small reads for identified novel miRNAs. This feature will enable biologists to visualize novel pre-miRNA structure and the location of small RNA reads relative to the hairpin. Moreover, miRPlant can be easily used by biologists with limited bioinformatics skills.
Resumo:
Coleoptera is the most diverse group of insects with over 360,000 described species divided into four suborders: Adephaga, Archostemata, Myxophaga, and Polyphaga. In this study, we present six new complete mitochondrial genome (mtgenome) descriptions, including a representative of each suborder, and analyze the evolution of mtgenomes from a comparative framework using all available coleopteran mtgenomes. We propose a modification of atypical cox1 start codons based on sequence alignment to better reflect the conservation observed across species as well as findings of TTG start codons in other genes. We also analyze tRNA-Ser(AGN) anticodons, usually GCU in arthropods, and report a conserved UCU anticodon as a possible synapomorphy across Polyphaga. We further analyze the secondary structure of tRNA-Ser(AGN) and present a consensus structure and an updated covariance model that allows tRNAscan-SE (via the COVE software package) to locate and fold these atypical tRNAs with much greater consistency. We also report secondary structure predictions for both rRNA genes based on conserved stems. All six species of beetle have the same gene order as the ancestral insect. We report noncoding DNA regions, including a small gap region of about 20 bp between tRNA-Ser(UCN) and nad1 that is present in all six genomes, and present results of a base composition analysis.
Resumo:
We present a machine learning model that predicts a structural disruption score from a protein s primary structure. SCHEMA was introduced by Frances Arnold and colleagues as a method for determining putative recombination sites of a protein on the basis of the full (PDB) description of its structure. The present method provides an alternative to SCHEMA that is able to determine the same score from sequence data only. Circumventing the need for resolving the full structure enables the exploration of yet unresolved and even hypothetical sequences for protein design efforts. Deriving the SCHEMA score from a primary structure is achieved using a two step approach: first predicting a secondary structure from the sequence and then predicting the SCHEMA score from the predicted secondary structure. The correlation coefficient for the prediction is 0.88 and indicates the feasibility of replacing SCHEMA with little loss of precision.
Resumo:
The predicted secondary structure of sub-genomic RNA in dengue virus defective interfering (D.I.) particles from patients, or generated in vitro, resembled that of the 3′ and 5′ regions of wild type dengue virus (DENV) genomes. While these structures in the sub-genomic RNA were found to be essential for its replication, their nucleotide sequences were not, so long as any new sequences maintained wild type RNA secondary structure. These observations suggested that these sub-genomic fragments of RNA from dengue viruses were replicated in the same manner as the full length genomes of their wild type, “helper”, viruses and that they probably represent the smallest fragments of DENV RNA that can be replicated during a natural infection. While D.I. particles containing sub-genomic RNA are completely parasitic, the relationship between wild type and D.I. DENV may be symbiotic, with the D.I. particles enhancing the transmission of infectious DENV.
Resumo:
Background Strand specific RNAseq data is now more common in RNAseq projects. Visualizing RNAseq data has become an important matter in Analysis of sequencing data. The most widely used visualization tool is the UCSC genome browser that introduced the custom track concept that enabled researchers to simultaneously visualize gene expression at a particular locus from multiple experiments. Our objective of the software tool is to provide friendly interface for visualization of RNAseq datasets. Results This paper introduces a visualization tool (RNASeqBrowser) that incorporates and extends the functionality of the UCSC genome browser. For example, RNASeqBrowser simultaneously displays read coverage, SNPs, InDels and raw read tracks with other BED and wiggle tracks -- all being dynamically built from the BAM file. Paired reads are also connected in the browser to enable easier identification of novel exon/intron borders and chimaeric transcripts. Strand specific RNAseq data is also supported by RNASeqBrowser that displays reads above (positive strand transcript) or below (negative strand transcripts) a central line. Finally, RNASeqBrowser was designed for ease of use for users with few bioinformatic skills, and incorporates the features of many genome browsers into one platform. Conclusions The features of RNASeqBrowser: (1) RNASeqBrowser integrates UCSC genome browser and NGS visualization tools such as IGV. It extends the functionality of the UCSC genome browser by adding several new types of tracks to show NGS data such as individual raw reads, SNPs and InDels. (2) RNASeqBrowser can dynamically generate RNA secondary structure. It is useful for identifying non-coding RNA such as miRNA. (3) Overlaying NGS wiggle data is helpful in displaying differential expression and is simple to implement in RNASeqBrowser. (4) NGS data accumulates a lot of raw reads. Thus, RNASeqBrowser collapses exact duplicate reads to reduce visualization space. Normal PC’s can show many windows of NGS individual raw reads without much delay. (5) Multiple popup windows of individual raw reads provide users with more viewing space. This avoids existing approaches (such as IGV) which squeeze all raw reads into one window. This will be helpful for visualizing multiple datasets simultaneously. RNASeqBrowser and its manual are freely available at http://www.australianprostatecentre.org/research/software/rnaseqbrowser webcite or http://sourceforge.net/projects/rnaseqbrowser/ webcite
Resumo:
Copy number variations (CNVs) as described in the healthy population are purported to contribute significantly to genetic heterogeneity. Recent studies have described CNVs using lymphoblastoid cell lines or by application of specifically developed algorithms to interrogate previously described data. However, the full extent of CNVs remains unclear. Using high-density SNP array, we have undertaken a comprehensive investigation of chromosome 18 for CNV discovery and characterisation of distribution and association with chromosome architecture. We identified 399 CNVs, of which loss represents 98%, 58% are less than 2.5 kb in size and 71% are intergenic. Intronic deletions account for the majority of copy number changes with gene involvement. Furthermore, one-third of CNVs do not have putative breakpoints within repetitive sequences. We conclude that replicative processes, mediated either by repetitive elements or microhomology, account for the majority of CNVs in the healthy population. Genomic instability involving the formation of a non-B structure is demonstrated in one region.
Resumo:
Elucidation of the detailed structural features and sequence requirements for iv helices of various lengths could be very important in understanding secondary structure formation in proteins and, hence. in the protein folding mechanism. An algorithm to characterize the geometry of an alpha helix from its C-alpha coordinates has been developed and used to analyze the structures of long cu helices (number of residues greater than or equal to 25) found in globular proteins, the crystal structure coordinates of which are available from the Brookhaven Protein Data Bank, Ail long a helices can be unambiguously characterized as belonging to one of three classes: linear, curved, or kinked, with a majority being curved. Analysis of the sequences of these helices reveals that the long alpha helices have unique sequence characteristics that distinguish them from the short alpha helices in globular proteins, The distribution and statistical propensities of individual amino acids to occur in long alpha heices are different from those found in short alpha helices, with amino acids having longer side chains and/or having a greater number of functional groups occurring more frequently in these helices, The sequences of the long alpha helices can be correlated with their gross structural features, i.e., whether they are curved, linear, or kinked, and in case of the curved helices, with their curvature.
Resumo:
We study the secondary structure of RNA determined by Watson-Crick pairing without pseudo-knots using Milnor invariants of links. We focus on the first non-trivial invariant, which we call the Heisenber invariant. The Heisenberg invariant, which is an integer, can be interpreted in terms of the Heisenberg group as well as in terms of lattice paths. We show that the Heisenberg invariant gives a lower bound on the number of unpaired bases in an RNA secondary structure. We also show that the Heisenberg invariant can predict allosteric structures for RNA. Namely, if the Heisenberg invariant is large, then there are widely separated local maxima (i.e., allosteric structures) for the number of Watson-Crick pairs found.
Resumo:
B. cereus is one of the most frequent occurring bacteria in foods . It produces several heat-labile enterotoxins and one stable non-protein toxin, cereulide (emetic), which may be pre-formed in food. Cereulide is a heat stable peptide whose structure and mechanism of action were in the past decade elucidated. Until this work, the detection of cereulide was done by biological assays. With my mentors, I developed the first quantitative chemical assay for cereulide. The assay is based on liquid chromatography (HPLC) combined with ion trap mass spectrometry and the calibration is done with valinomycin and purified cereulide. To detect and quantitate valinomycin and cereulide, their [NH4+] adducts, m/z 1128.9 and m/z 1171 respectively, were used. This was a breakthrough in the cereulide research and became a very powerful tool of investigation. This tool made it possible to prove for the first time that the toxin produced by B. cereus in heat-treated food caused human illness. Until this thesis work (Paper II), cereulide producing B. cereus strains were believed to represent a homogenous group of clonal strains. The cereulide producing strains investigated in those studies originated mostly from food poisoning incidents. We used strains of many origins and analyzed them using a polyphasic approach. We found that the cereulide producing B. cereus strains are genetically and biologically more diverse than assumed in earlier studies. The strains diverge in the adenylate kinase (adk) gene (two sequence types), in ribopatterns obtained with EcoRI and PvuII (three patterns), tyrosin decomposition, haemolysis and lecithine hydrolysis (two phenotypes). Our study was the first demonstration of diversity within the cereulide producing strains of B. cereus. To manage the risk for cereulide production in food, understanding is needed on factors that may upregulate cereulide production in a given food matrix and the environmental factors affecting it. As a contribution towards this direction, we adjusted the growth environment and measured the cereulide production by strains selected for diversity. The temperature range where cereulide is produced was narrower than that for growth for most of the producer strains. Most cereulide was by most strains produced at room temperature (20 - 23ºC). Exceptions to this were two faecal isolates which produced the same amount of cereulide from 23 ºC up until 39ºC. We also found that at 37º C the choice of growth media for cereulide production differed from that at the room temperature. The food composition and temperature may thus be a key for understanding cereulide production in foods as well as in the gut. We investigated the contents of [K+], [Na+] and amino acids of six growth media. Statistical evaluation indicated a significant positive correlation between the ratio [K+]:[Na+] and the production of cereulide, but only when the concentrations of glycine and [Na+] were constant. Of the amino acids only glycine correlated positively with high cereulide production. Glycine is used worldwide as food additive (E 640), flavor modifier, humectant, acidity regulator, and is permitted in the European Union countries, with no regulatory quantitative limitation, in most types of foods. B. subtilis group members are endospore-forming bacteria ubiquitous in the environment, similar to B. cereus in this respect. Bacillus species other than B. cereus have only sporadically been identified as causative agents of food-borne illnesses. We found (Paper IV) that food-borne isolates of B. subtilis and B. mojavensis produced amylosin. It is possible that amylosin was the agent responsible for the food-borne illness, since no other toxic substance was found in the strains. This is the first report on amylosin production by strains isolated from food. We found that the temperature requirement for amylosin production was higher for the B. subtilis strain F 2564/96, a mesophilic producer, than for B. mojavensis strains eela 2293 and B 31, psychrotolerant producers. We also found that an atmosphere with low oxygen did not prevent the production of amylosin. Ready-to-eat foods packaged in micro-aerophilic atmosphere and/or stored at temperatures above 10 °C, may thus pose a risk when toxigenic strains of B. subtilis or B. mojavensis are present.
Resumo:
The far-ultraviolet region circular dichroic spectrumof serine hydroxymethyltransferase from monkey liver showed that the protein is in an α-helical conformation. The near ultraviolet circular dichoric spectrum revealed two negative bands originating from the tertiary conformational environment of the aromatic amino acid residues. Addition of urea or guanidinium chloride perturbed the characteristic fluorescence and far ultraviolet circular dichroic spectrum of the enzyme. The decrease in (θ)222 and enzyme activity followed identical patterns with increasing concentrations of urea, whereas with guanidinium chloride, the loss of enzyme activity preceded the loss of secondary structure. 2-Chloroethanol, trifluoroethanol and sodium dodecyl sulphate enhanced the mean residue ellipticity values. In addition, sodium dodecyl sulphate also caused a perturbation of the fluorescence emission spectrum of the enzyme. Extremes of pH decreased the – (θ)222 value. Plots of –(θ)222and enzyme activity as a function of pH showed maximal values at pH 7.4-7.5. These results suggested the prevalence of "conformational flexibility" in the structure of serine hydroxymethyltransferase.