43 resultados para secondary structure analysis
Resumo:
Our previous studies using trans-complementation analysis of Kunjin virus (KUN) full-length cDNA clones harboring in-frame deletions in the NS3 gene demonstrated the inability of these defective complemented RNAs to be packaged into virus particles (W. J. Liu, P. L. Sedlak, N. Kondratieva, and A. A. Khromykh, J. Virol. 76:10766-10775). In this study we aimed to establish whether this requirement for NS3 in RNA packaging is determined by the secondary RNA structure of the NS3 gene or by the essential role of the translated NS3 gene product. Multiple silent mutations of three computer-predicted stable RNA structures in the NS3 coding region of KUN replicon RNA aimed at disrupting RNA secondary structure without affecting amino acid sequence did not affect RNA replication and packaging into virus-like particles in the packaging cell line, thus demonstrating that the predicted conserved RNA structures in the NS3 gene do not play a role in RNA replication and/or packaging. In contrast, double frameshift mutations in the NS3 coding region of full-length KUN RNA, producing scrambled NS3 protein but retaining secondary RNA structure, resulted in the loss of ability of these defective RNAs to be packaged into virus particles in complementation experiments in KUN replicon-expressing cells. Furthermore, the more robust complementation-packaging system based on established stable cell lines producing large amounts of complemented replicating NS3-deficient replicon RNAs and infection with KUN virus to provide structural proteins also failed to detect any secreted virus-like particles containing packaged NS3-deficient replicon RNAs. These results have now firmly established the requirement of KUN NS3 protein translated in cis for genome packaging into virus particles.
Resumo:
Motivation: Conformational flexibility is essential to the function of many proteins, e.g. catalytic activity. To assist efforts in determining and exploring the functional properties of a protein, it is desirable to automatically identify regions that are prone to undergo conformational changes. It was recently shown that a probabilistic predictor of continuum secondary structure is more accurate than categorical predictors for structurally ambivalent sequence regions, suggesting that such models are suited to characterize protein flexibility. Results: We develop a computational method for identifying regions that are prone to conformational change directly from the amino acid sequence. The method uses the entropy of the probabilistic output of an 8-class continuum secondary structure predictor. Results for 171 unique amino acid sequences with well-characterized variable structure (identified in the 'Macromolecular movements database') indicate that the method is highly sensitive at identifying flexible protein regions, but false positives remain a problem. The method can be used to explore conformational flexibility of proteins (including hypothetical or synthetic ones) whose structure is yet to be determined experimentally.
Resumo:
Background: The structure of proteins may change as a result of the inherent flexibility of some protein regions. We develop and explore probabilistic machine learning methods for predicting a continuum secondary structure, i.e. assigning probabilities to the conformational states of a residue. We train our methods using data derived from high-quality NMR models. Results: Several probabilistic models not only successfully estimate the continuum secondary structure, but also provide a categorical output on par with models directly trained on categorical data. Importantly, models trained on the continuum secondary structure are also better than their categorical counterparts at identifying the conformational state for structurally ambivalent residues. Conclusion: Cascaded probabilistic neural networks trained on the continuum secondary structure exhibit better accuracy in structurally ambivalent regions of proteins, while sustaining an overall classification accuracy on par with standard, categorical prediction methods.
Resumo:
The PotE protein is a putrescine-ornithine antiporter found in many gram-negative bacteria. It is a member of the APA family of transporters and has 12 predicted alpha-helical transmembrane spanning segments (TMS). While the substrate binding site has previously been mapped to a region near the surface of the cytoplasmic lipid layer, no structural feature within the periplasmic domains of PotE have been shown to be important for function. We examined the role of the only large outer loop, situated between transmembrane spanning segment 7 and 8, in putrescine uptake. Deletion of the highly conserved amino acids in the region closest to transmembrane spanning segment 7 produced a protein with little activity. Glycine-scanning mutagenesis of this region showed that Val(249) and Leu(254) were required for optimal transporter function. The V249G mutant transported putrescine at a lower maximal rate compared to wild-type (WT) but with the same substrate binding affinity. In contrast, the L254G mutant had a higher substrate affinity. A series of Val(249) mutants indicated that the hydrophobicity of this residue, which is located at or near the membrane surface, is important for PotE function. Secondary structure predictions of the large outer loop indicated the presence of a hydrophobic alpha-helix in the centre with a hydrophobic region at each end suggesting that the loop was not entirely exposed to the aqueous periplasmic space. The study shows that loop 7-8 is important for PotE function, possibly by forming a re-entrant loop in the channel of the transporter. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.
Resumo:
Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.
Resumo:
The caseins (alpha(s1), alpha(s2), beta, and kappa) are phosphoproteins present in bovine milk that have been studied for over a century and whose structures remain obscure. Here we describe the chemical synthesis and structure elucidation of the N-terminal segment (1-44) of bovine K-casein, the protein which maintains the micellar structure of the caseins. K-Casein (1-44) was synthesised by highly optimised Boc solid-phase peptide chemistry and characterised by mass spectrometry. Structure elucidation was carried out by circular dichroism and nuclear magnetic resonance spectroscopy. CD analysis demonstrated that the segment was ill defined in aqueous medium but in 30% trifluoroethanol it exhibited considerable helical structure. Further, NMR analysis showed the presence of a helical segment containing 26 residues which extends from Pro(8) to Arg(34). This is the first report which demonstrates extensive secondary structure within the casein class of proteins. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
Conotoxins are small conformationally constrained peptides found in the venom of marine snails of the genus Conus. They are usually cysteine rich and frequently contain a high degree of post-translational modifications such as C-terminal amidation, hydroxylation, carboxylation, bromination, epimerisation and glycosylation. Here we review the role of NMR in determining the three-dimensional structures of conotoxins and also provide a compilation and analysis of H-1 and C-13 chemical shifts of post-translationally modified amino acids and compare them with data from common amino acids. This analysis provides a reference source for chemical shifts of post-translationally modified amino acids. Copyright (C) 2006 John Wiley & Sons, Ltd.
Resumo:
The albA gene from Klebsiella oxytoca encodes a protein that binds albicidin phytotoxins and antibiotics with high affinity. Previously, it has been shown that shifting pH from 6 to 4 reduces binding activity of AlbA by about 30%, indicating that histidine residues might be involved in substrate binding. In this study, molecular analysis of the albA coding region revealed sequence discrepancies with the albA sequence reported previously, which were probably due to sequencing errors. The albA gene was subsequently cloned from K oxytoca ATCC 13182(T) to establish the revised sequence. Biochemical and molecular approaches were used to determine the functional role of four histidine residues (His(78), HiS(125), HiS(141) and His(189)) in the corrected sequence for AlbA. Treatment of AlbA with diethyl pyrocarbonate (DEPC), a histidine-specific alkylating reagent, reduced binding activity by about 95%. DEPC treatment increased absorbance at 240-244 nm by an amount indicating conversion to N-carbethoxyhistidine of a single histidine residue per AlbA molecule. Pretreatment with albicidin protected AlbA against modification by DEPC, with a 1 : 1 molar ratio of albicidin to the protected histidine residues. Based on protein secondary structure and amino acid surface probability indices, it is predicted that HiS125 might be the residue required for albicidin binding. Mutation of HiS125 to either alanine or leucine resulted in about 32% loss of binding activity, and deletion of HiS125 totally abolished binding activity. Mutation of HiS125 to arginine and tyrosine had no effect. These results indicate that HiS125 plays a key role either in an electrostatic interaction between AlbA and albicidin or in the conformational dynamics of the albicidin-binding site.
Resumo:
The structure of a novel plant defensin isolated from the flowers of Petunia hybrida has been determined by H-1 NMR spectroscopy. P. hybrida defensin 1 (PhD1) is a basic, cysteine-rich, antifungal protein of 47 residues and is the first example of a new subclass of plant defensins with five disulfide bonds whose structure has been determined. PhD1 has the fold of the cysteine-stabilized alphabeta motif, consisting of an alpha-helix and a triple-stranded antiparallel beta-sheet, except that it contains a fifth disulfide bond from the first loop to the alpha-helix. The additional disulfide bond is accommodated in PhD1 without any alteration of its tertiary structure with respect to other plant defensins. Comparison of its structure with those of classic, four-disulfide defensins has allowed us to identify a previously unrecognized hydrogen bond network that is integral to structure stabilization in the family.
Resumo:
A novel member of the human relaxin subclass of the insulin superfamily was recently discovered during a genomics database search and named relaxin-3. Like human relaxin-1 and relaxin-2, relaxin-3 is predicted to consist of a two-chain structure and three disulfide bonds in a disposition identical to that of insulin. To undertake detailed biophysical and biological characterization of the peptide, its chemical synthesis was undertaken. In contrast to human relaxin-1 and relaxin-2, however, relaxin-3 could not be successfully prepared by simple combination of the individual chains, thus necessitating recourse to the use of a regioselective disulfide bond formation strategy. Solid phase synthesis of the separate, selectively S-protected A and B chains followed by their purification and the subsequent stepwise formation of each of the three disulfides led to the successful acquisition of human relaxin-3. Comprehensive chemical characterization confirmed both the correct chain orientation and the integrity of the synthetic product. Relaxin-3 was found to bind to and activate native relaxin receptors in vitro and stimulate water drinking through central relaxin receptors in vivo. Recent studies have demonstrated that relaxin-3 will bind to and activate human LGR7, but not LGR8, in vitro. Secondary structural analysis showed it to adopt a less ordered confirmation than either relaxin-1 or relaxin-2, reflecting the presence in the former of a greater percentage of nonhelical forming amino acids. NMR spectroscopy and simulated annealing calculations were used to determine the three-dimensional structure of relaxin-3 and to identify key structural differences between the human relaxins.
Resumo:
Recently, we identified a large number of ultraconserved (uc) sequences in noncoding regions of human, mouse, and rat genomes that appear to be essential for vertebrate and amniote ontogeny. Here, we used similar methods to identify ultraconserved genomic regions between the insect species Drosophila melanogaster and Drosophila pseudoobscura, as well as the more distantly related Anopheles gambiae. As with vertebrates, ultraconserved sequences in insects appear to Occur primarily in intergenic and intronic sequences, and at intron-exon junctions. The sequences are significantly associated with genes encoding developmental regulators and transcription factors, but are less frequent and are smaller in size than in vertebrates. The longest identical, nongapped orthologous match between the three genomes was found within the homothorax (hth) gene. This sequence spans an internal exon-intron junction, with the majority located within the intron, and is predicted to form a highly stable stem-loop RNA structure. Real-time quantitative PCR analysis of different hth splice isoforms and Northern blotting showed that the conserved element is associated with a high incidence of intron retention in hth pre-mRNA, suggesting that the conserved intronic element is critically important in the post-transcriptional regulation of hth expression in Diptera.
Resumo:
We discuss recent progress towards the establishment of important structure-property-function relationships in eumelanins-key functional bio-macromolecular systems responsible for photoprotection and immune response in humans, and implicated in the development of melanoma skin cancer. We focus on the link between eumelanin's secondary structure and optical properties such as broad band UV-visible absorption and strong non-radiative relaxation; both key features of the photo-protective function. We emphasise the insights gained through a holistic approach combining optical spectroscopy with first principles quantum chemical calculations, and advance the hypothesis that the robust functionality characteristic of eumelanin is related to extreme chemical and structural disorder at the secondary level. This inherent disorder is a low cost natural resource, and it is interesting to speculate as to whether it may play a role in other functional bio-macromolecular systems.
Resumo:
In this study, we propose a novel method to predict the solvent accessible surface areas of transmembrane residues. For both transmembrane alpha-helix and beta-barrel residues, the correlation coefficients between the predicted and observed accessible surface areas are around 0.65. On the basis of predicted accessible surface areas, residues exposed to the lipid environment or buried inside a protein can be identified by using certain cutoff thresholds. We have extensively examined our approach based on different definitions of accessible surface areas and a variety of sets of control parameters. Given that experimentally determining the structures of membrane proteins is very difficult and membrane proteins are actually abundant in nature, our approach is useful for theoretically modeling membrane protein tertiary structures, particularly for modeling the assembly of transmembrane domains. This approach can be used to annotate the membrane proteins in proteomes to provide extra structural and functional information.