152 resultados para Protein structure prediction
em University of Queensland eSpace - Australia
Resumo:
We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two windows of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. (C) 2004 Wiley-Liss, Inc.
Resumo:
We describe two ways of optimizing score functions for protein sequence to structure threading. The first method adjusts parameters to improve sequence to structure alignment. The second adjusts parameters so as to improve a score function's ability to rank alignments calculated in the first score function. Unlike those functions known as knowledge-based force fields, the resulting parameter sets do not rely on Boltzmann statistics, have no claim to representing free energies and are purely constructions for recognizing protein folds. The methods give a small improvement, but suggest that functions can be profitably optimized for very specific aspects of protein fold recognition, Proteins 1999;36:454-461. (C) 1999 Wiley-Liss, Inc.
Resumo:
Conventionally, protein structure prediction via threading relies on some nonoptimal method to align a protein sequence to each member of a library of known structures. We show how a score function (force field) can be modified so as to allow the direct application of a dynamic programming algorithm to the problem. This involves an approximation whose damage can be minimized by an optimization process during score function parameter determination. The method is compared to sequence to structure alignments using a more conventional pair-wise score function and the frozen approximation. The new method produces results comparable to the frozen approximation, but is faster and has fewer adjustable parameters. It is also free of memory of the template's original amino acid sequence, and does not suffer from a problem of nonconvergence, which can be shown to occur with the frozen approximation. Alignments generated by the simplified score function can then be ranked using a second score function with the approximations removed. (C) 1999 John Wiley & Sons, Inc.
Resumo:
Bacterial chaperonin, GroEL, together with its co-chaperonin, GroES, facilitates the folding of a variety of polypeptides. Experiments suggest that GroEL stimulates protein folding by multiple cycles of binding and release. Misfolded proteins first bind to an exposed hydrophobic surface on GroEL. GroES then encapsulates the substrate and triggers its release into the central cavity of the GroEL/ES complex for folding. In this work, we investigate the possibility to facilitate protein folding in molecular dynamics simulations by mimicking the effects of GroEL/ES namely, repeated binding and release, together with spatial confinement. During the binding stage, the (metastable) partially folded proteins are allowed to attach spontaneously to a hydrophobic surface within the simulation box. This destabilizes the structures, which are then transferred into a spatially confined cavity for folding. The approach has been tested by attempting to refine protein structural models generated using the ROSETTA procedure for ab initio structure prediction. Dramatic improvements in regard to the deviation of protein models from the corresponding experimental structures were observed. The results suggest that the primary effects of the GroEL/ES system can be mimicked in a simple coarse-grained manner and be used to facilitate protein folding in molecular dynamics simulations. Furthermore, the results Sur port the assumption that the spatial confinement in GroEL/ES assists the folding of encapsulated proteins.
Resumo:
For determining functionality dependencies between two proteins, both represented as 3D structures, it is an essential condition that they have one or more matching structural regions called patches. As 3D structures for proteins are large, complex and constantly evolving, it is computationally expensive and very time-consuming to identify possible locations and sizes of patches for a given protein against a large protein database. In this paper, we address a vector space based representation for protein structures, where a patch is formed by the vectors within the region. Based on our previews work, a compact representation of the patch named patch signature is applied here. A similarity measure of two patches is then derived based on their signatures. To achieve fast patch matching in large protein databases, a match-and-expand strategy is proposed. Given a query patch, a set of small k-sized matching patches, called candidate patches, is generated in match stage. The candidate patches are further filtered by enlarging k in expand stage. Our extensive experimental results demonstrate encouraging performances with respect to this biologically critical but previously computationally prohibitive problem.
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.
Resumo:
To ensure signalling fidelity, kinases must act only on a defined subset of cellular targets. Appreciating the basis for this substrate specificity is essential for understanding the role of an individual protein kinase in a particular cellular process. The specificity in the cell is determined by a combination of peptide specificity of the kinase (the molecular recognition of the sequence surrounding the phosphorylation site), substrate recruitment and phosphatase activity. Peptide specificity plays a crucial role and depends on the complementarity between the kinase and the substrate and therefore on their three-dimensional structures. Methods for experimental identification of kinase substrates and characterization of specificity are expensive and laborious, therefore, computational approaches are being developed to reduce the amount of experimental work required in substrate identification. We discuss the structural basis of substrate specificity of protein kinases and review the experimental and computational methods used to obtain specificity information. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
We present a fast method for finding optimal parameters for a low-resolution (threading) force field intended to distinguish correct from incorrect folds for a given protein sequence. In contrast to other methods, the parameterization uses information from >10(7) misfolded structures as well as a set of native sequence-structure pairs. In addition to testing the resulting force field's performance on the protein sequence threading problem, results are shown that characterize the number of parameters necessary for effective structure recognition.
Resumo:
Sausage is a protein sequence threading program, but with remarkable run-time flexibility. Using different scripts, it can calculate protein sequence-structure alignments, search structure libraries, swap force fields, create models form alignments, convert file formats and analyse results. There are several different force fields which might be classed as knowledge-based, although they do not rely on Boltzmann statistics. Different force fields are used for alignment calculations and subsequent ranking of calculated models.
Resumo:
The efficient and correct folding of bacterial disulfide bonded proteins in vivo is dependent upon a class of periplasmic oxidoreductase proteins called DsbA, after the Escherichia coli enzyme. In the pathogenic bacterium Vibrio cholerae, the DsbA homolog (TcpG) is responsible for the folding, maturation and secretion of virulence factors. Mutants in which the tcpg gene has been inactivated are avirulent; they no longer produce functional colonisation pill and they no longer secrete cholera toxin. TcpG is thus a suitable target for inhibitors that could counteract the virulence of this organism, thereby preventing the symptoms of cholera. The crystal structure of oxidized TcpG (refined at a resolution of 2.1 Angstrom) serves as a starting point for the rational design of such inhibitors. As expected, TcpG has the same fold as E. coli DsbA, with which it shares similar to 40% sequence identity. Ln addition, the characteristic surface features of DsbA are present in TcpG, supporting the notion that these features play a functional role. While the overall architecture of TcpG and DsbA is similar and the surface features are retained in TcpG, there are significant differences. For example, the kinked active site helix results from a three-residue loop in DsbA, but is caused by a proline in TcpG (making TcpG more similar to thioredoxin in this respect). Furthermore, the proposed peptide binding groove of TcpG is substantially shortened compared with that of DsbA due to a six-residue deletion. Also, the hydrophobic pocket of TcpG is more shallow and the acidic patch is much less extensive than that of E. coli DsbA. The identification of the structural and surface features that are retained or are divergent in TcpG provides a useful assessment of their functional importance in these protein folding catalysts and is an important prerequisite for the design of TcpG inhibitors. (C) 1997 Academic Press Limited.
Resumo:
A two-domain portion of the proteinase inhibitor precursor from Nicotiana alata (NaProPI) has been expressed and its structure determined by NMR spectroscopy. NaProPI contains six almost identical 53 amino acid repeats that fold into six highly similar domains; however, the sequence repeats do nut coincide with the structural domains. Five of the structural domains comprise the C-terminal portion of one repeat and the N-terminal portion of the next. The sixth domain contains the C-terminal portion of the sixth repeat and the N-terminal portion of the first repeat. Disulphide bonds link these C and N-terminal fragments to generate the clasped-bracelet fold of NaProPI. The three-dimensional structure of NaProPI is not known, but it is conceivable that adjacent domains in NaProPI interact to generate the circular bracelet with the N and C termini in close enough proximity to facilitate formation of the disulphide bonds that form the clasp The expressed protein, examined in the current study, comprises residues 25-135 of NaProPI and encompasses the first two contiguous structural domains, namely the chymotrypsin inhibitor C1 and the trypsin inhibitor T1, joined by a five-residue linker, and is referred to as C1-T1. The tertiary structure of each domain in C1-T1 is identical to that found in the isolated inhibitors. However, no nuclear Overhauser effect contacts are observed between the two domains and the five-residue linker adopts an extended conformation. The absence of interactions between the domains indicates that adjacent domains do not specifically interact to drive the circularisation of NaProPI. These results are in agreement with recent data which describe similar PI precursors from other members of the Solanaceae having two, three, or four repeats. The lack of strong interdomain association is likely to be important for the function of individual inhibitors by ensuring that there is no masking of reactive sites upon release from the precursor. (C) 2001 Academic Press.
Resumo:
NMR spectroscopy and simulated annealing calculations have been used to determine the three-dimensional structure of NaD1, a novel antifungal and insecticidal protein isolated from the flowers of Nicotiana alata. NaD1 is a basic, cysteine-rich protein of 47 residues and is the first example of a plant defensin from flowers to be characterized structurally. Its three-dimensional structure consists of an a-helix and a triple-stranded anti-parallel beta-sheet that are stabilized by four intramolecular disulfide bonds. NaD1 features all the characteristics of the cysteine-stabilized up motif that has been described for a variety of proteins of differing functions ranging from antibacterial insect defensins and ion channel-perturbing scorpion toxins to an elicitor of the sweet taste response. The protein is biologically active against insect pests, which makes it a potential candidate for use in crop protection. NaD1 shares 31% sequence identity with alfAFP, an antifungal protein from alfalfa that confers resistance to a fungal pathogen in transgenic potatoes. The structure of NaD1 was used to obtain a homology model of alfAFP, since NaD1 has the highest level of sequence identity with alfAFP of any structurally characterized antifungal defensin. The structures of NaD1 and alfAFP were used in conjunction with structure - activity data for the radish defensin Rs-AFP2 to provide an insight into structure-function relationships. In particular, a putative effector site was identified in the structure of NaD1 and in the corresponding homology model of alfAFP. (C) 2002 Elsevier Science Ltd. All rights reserved.