42 resultados para Secondary Structure Prediction

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The structure of proteins may change as a result of the inherent flexibility of some protein regions. We develop and explore probabilistic machine learning methods for predicting a continuum secondary structure, i.e. assigning probabilities to the conformational states of a residue. We train our methods using data derived from high-quality NMR models. Results: Several probabilistic models not only successfully estimate the continuum secondary structure, but also provide a categorical output on par with models directly trained on categorical data. Importantly, models trained on the continuum secondary structure are also better than their categorical counterparts at identifying the conformational state for structurally ambivalent residues. Conclusion: Cascaded probabilistic neural networks trained on the continuum secondary structure exhibit better accuracy in structurally ambivalent regions of proteins, while sustaining an overall classification accuracy on par with standard, categorical prediction methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: Conformational flexibility is essential to the function of many proteins, e.g. catalytic activity. To assist efforts in determining and exploring the functional properties of a protein, it is desirable to automatically identify regions that are prone to undergo conformational changes. It was recently shown that a probabilistic predictor of continuum secondary structure is more accurate than categorical predictors for structurally ambivalent sequence regions, suggesting that such models are suited to characterize protein flexibility. Results: We develop a computational method for identifying regions that are prone to conformational change directly from the amino acid sequence. The method uses the entropy of the probabilistic output of an 8-class continuum secondary structure predictor. Results for 171 unique amino acid sequences with well-characterized variable structure (identified in the 'Macromolecular movements database') indicate that the method is highly sensitive at identifying flexible protein regions, but false positives remain a problem. The method can be used to explore conformational flexibility of proteins (including hypothetical or synthetic ones) whose structure is yet to be determined experimentally.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Our previous studies using trans-complementation analysis of Kunjin virus (KUN) full-length cDNA clones harboring in-frame deletions in the NS3 gene demonstrated the inability of these defective complemented RNAs to be packaged into virus particles (W. J. Liu, P. L. Sedlak, N. Kondratieva, and A. A. Khromykh, J. Virol. 76:10766-10775). In this study we aimed to establish whether this requirement for NS3 in RNA packaging is determined by the secondary RNA structure of the NS3 gene or by the essential role of the translated NS3 gene product. Multiple silent mutations of three computer-predicted stable RNA structures in the NS3 coding region of KUN replicon RNA aimed at disrupting RNA secondary structure without affecting amino acid sequence did not affect RNA replication and packaging into virus-like particles in the packaging cell line, thus demonstrating that the predicted conserved RNA structures in the NS3 gene do not play a role in RNA replication and/or packaging. In contrast, double frameshift mutations in the NS3 coding region of full-length KUN RNA, producing scrambled NS3 protein but retaining secondary RNA structure, resulted in the loss of ability of these defective RNAs to be packaged into virus particles in complementation experiments in KUN replicon-expressing cells. Furthermore, the more robust complementation-packaging system based on established stable cell lines producing large amounts of complemented replicating NS3-deficient replicon RNAs and infection with KUN virus to provide structural proteins also failed to detect any secreted virus-like particles containing packaged NS3-deficient replicon RNAs. These results have now firmly established the requirement of KUN NS3 protein translated in cis for genome packaging into virus particles.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purple acid phosphatases are a family of binuclear metallohydrolases that have been identified in plants, animals and fungi. Only one isoform of similar to 35 kDa has been isolated from animals, where it is associated with bone resorption and microbial killing through its phosphatase activity, and hydroxyl radical production, respectively. Using the sensitive PSI-BLAST search method, sequences representing new purple acid phosphatase-like proteins have been identified in mammals, insects and nematodes. These new putative isoforms are closely related to the similar to 55 kDa purple acid phosphatase characterized from plants. Secondary structure prediction of the new human isoform further confirms its similarity to a purple acid phosphatase from the red kidney bean. A structural model for the human enzyme was constructed based on the red kidney bean purple acid phosphatase structure. This model shows that the catalytic centre observed in other purple acid phosphatases is also present in this new isoform. These observations suggest that the sequences identified in this study represent a novel subfamily of plant-like purple acid phosphatases in animals and humans. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we propose a novel method to predict the solvent accessible surface areas of transmembrane residues. For both transmembrane alpha-helix and beta-barrel residues, the correlation coefficients between the predicted and observed accessible surface areas are around 0.65. On the basis of predicted accessible surface areas, residues exposed to the lipid environment or buried inside a protein can be identified by using certain cutoff thresholds. We have extensively examined our approach based on different definitions of accessible surface areas and a variety of sets of control parameters. Given that experimentally determining the structures of membrane proteins is very difficult and membrane proteins are actually abundant in nature, our approach is useful for theoretically modeling membrane protein tertiary structures, particularly for modeling the assembly of transmembrane domains. This approach can be used to annotate the membrane proteins in proteomes to provide extra structural and functional information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Designing novel proteins with site-directed recombination has enormous prospects. By locating effective recombination sites for swapping sequence parts, the probability that hybrid sequences have the desired properties is increased dramatically. The prohibitive requirements for applying current tools led us to investigate machine learning to assist in finding useful recombination sites from amino acid sequence alone. Results: We present STAR, Site Targeted Amino acid Recombination predictor, which produces a score indicating the structural disruption caused by recombination, for each position in an amino acid sequence. Example predictions contrasted with those of alternative tools, illustrate STAR'S utility to assist in determining useful recombination sites. Overall, the correlation coefficient between the output of the experimentally validated protein design algorithm SCHEMA and the prediction of STAR is very high (0.89). Conclusion: STAR allows the user to explore useful recombination sites in amino acid sequences with unknown structure and unknown evolutionary origin. The predictor service is available from http://pprowler.itee.uq.edu.au/star.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two windows of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. (C) 2004 Wiley-Liss, Inc.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: Targeting peptides direct nascent proteins to their specific subcellular compartment. Knowledge of targeting signals enables informed drug design and reliable annotation of gene products. However, due to the low similarity of such sequences and the dynamical nature of the sorting process, the computational prediction of subcellular localization of proteins is challenging. Results: We contrast the use of feed forward models as employed by the popular TargetP/SignalP predictors with a sequence-biased recurrent network model. The models are evaluated in terms of performance at the residue level and at the sequence level, and demonstrate that recurrent networks improve the overall prediction performance. Compared to the original results reported for TargetP, an ensemble of the tested models increases the accuracy by 6 and 5% on non-plant and plant data, respectively.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The structure of a novel plant defensin isolated from the flowers of Petunia hybrida has been determined by H-1 NMR spectroscopy. P. hybrida defensin 1 (PhD1) is a basic, cysteine-rich, antifungal protein of 47 residues and is the first example of a new subclass of plant defensins with five disulfide bonds whose structure has been determined. PhD1 has the fold of the cysteine-stabilized alphabeta motif, consisting of an alpha-helix and a triple-stranded antiparallel beta-sheet, except that it contains a fifth disulfide bond from the first loop to the alpha-helix. The additional disulfide bond is accommodated in PhD1 without any alteration of its tertiary structure with respect to other plant defensins. Comparison of its structure with those of classic, four-disulfide defensins has allowed us to identify a previously unrecognized hydrogen bond network that is integral to structure stabilization in the family.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Limited but significant sequence similarity has been observed between an uncharacterized human protein, SIN1, and the S. pombe SIN1, Dictyostelium RIP3 and S. cerevisiae AVO1 proteins. The human Sin1 gene has been automatically predicted (MAPKAP1; GenBank accession number NM_024117); however, this sequence appears to be incomplete. In this study, we have cloned and characterized the full-length human Sin1 mRNA and identified a highly conserved domain that defines the family of SIN1 orthologues, members of which are widely distributed in the fungal and metazoan kingdoms. We demonstrate that Sin1 transcripts can use alternative polyadenylation signals and describe a number of Sin1 splice variants that potentially encode functionally different isoforms. (C) 2004 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We discuss recent progress towards the establishment of important structure-property-function relationships in eumelanins-key functional bio-macromolecular systems responsible for photoprotection and immune response in humans, and implicated in the development of melanoma skin cancer. We focus on the link between eumelanin's secondary structure and optical properties such as broad band UV-visible absorption and strong non-radiative relaxation; both key features of the photo-protective function. We emphasise the insights gained through a holistic approach combining optical spectroscopy with first principles quantum chemical calculations, and advance the hypothesis that the robust functionality characteristic of eumelanin is related to extreme chemical and structural disorder at the secondary level. This inherent disorder is a low cost natural resource, and it is interesting to speculate as to whether it may play a role in other functional bio-macromolecular systems.