54 resultados para Protein structure prediction
em University of Queensland eSpace - Australia
Resumo:
We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two windows of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. (C) 2004 Wiley-Liss, Inc.
Resumo:
Bacterial chaperonin, GroEL, together with its co-chaperonin, GroES, facilitates the folding of a variety of polypeptides. Experiments suggest that GroEL stimulates protein folding by multiple cycles of binding and release. Misfolded proteins first bind to an exposed hydrophobic surface on GroEL. GroES then encapsulates the substrate and triggers its release into the central cavity of the GroEL/ES complex for folding. In this work, we investigate the possibility to facilitate protein folding in molecular dynamics simulations by mimicking the effects of GroEL/ES namely, repeated binding and release, together with spatial confinement. During the binding stage, the (metastable) partially folded proteins are allowed to attach spontaneously to a hydrophobic surface within the simulation box. This destabilizes the structures, which are then transferred into a spatially confined cavity for folding. The approach has been tested by attempting to refine protein structural models generated using the ROSETTA procedure for ab initio structure prediction. Dramatic improvements in regard to the deviation of protein models from the corresponding experimental structures were observed. The results suggest that the primary effects of the GroEL/ES system can be mimicked in a simple coarse-grained manner and be used to facilitate protein folding in molecular dynamics simulations. Furthermore, the results Sur port the assumption that the spatial confinement in GroEL/ES assists the folding of encapsulated proteins.
Resumo:
For determining functionality dependencies between two proteins, both represented as 3D structures, it is an essential condition that they have one or more matching structural regions called patches. As 3D structures for proteins are large, complex and constantly evolving, it is computationally expensive and very time-consuming to identify possible locations and sizes of patches for a given protein against a large protein database. In this paper, we address a vector space based representation for protein structures, where a patch is formed by the vectors within the region. Based on our previews work, a compact representation of the patch named patch signature is applied here. A similarity measure of two patches is then derived based on their signatures. To achieve fast patch matching in large protein databases, a match-and-expand strategy is proposed. Given a query patch, a set of small k-sized matching patches, called candidate patches, is generated in match stage. The candidate patches are further filtered by enlarging k in expand stage. Our extensive experimental results demonstrate encouraging performances with respect to this biologically critical but previously computationally prohibitive problem.
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.
Resumo:
To ensure signalling fidelity, kinases must act only on a defined subset of cellular targets. Appreciating the basis for this substrate specificity is essential for understanding the role of an individual protein kinase in a particular cellular process. The specificity in the cell is determined by a combination of peptide specificity of the kinase (the molecular recognition of the sequence surrounding the phosphorylation site), substrate recruitment and phosphatase activity. Peptide specificity plays a crucial role and depends on the complementarity between the kinase and the substrate and therefore on their three-dimensional structures. Methods for experimental identification of kinase substrates and characterization of specificity are expensive and laborious, therefore, computational approaches are being developed to reduce the amount of experimental work required in substrate identification. We discuss the structural basis of substrate specificity of protein kinases and review the experimental and computational methods used to obtain specificity information. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
We have determined the crystal structure of the core (C) protein from the Kunjin subtype of West Nile virus (WNV), closely related to the NY99 strain of WNV, currently a major health threat in the U.S. WNV is a member of the Flaviviridae family of enveloped RNA viruses that contains many important human pathogens. The C protein is associated with the RNA genome and forms the internal core which is surrounded by the envelope in the virion. The C protein structure contains four a. helices and forms dimers that are organized into tetramers. The tetramers form extended filamentous ribbons resembling the stacked alpha helices seen in HEAT protein structures.
Resumo:
Wurst is a protein threading program with an emphasis on high quality sequence to structure alignments (http://www.zbh.uni-hamburg.de/wurst). Submitted sequences are aligned to each of about 3000 templates with a conventional dynamic programming algorithm, but using a score function with sophisticated structure and sequence terms. The structure terms are a log-odds probability of sequence to structure fragment compatibility, obtained from a Bayesian classification procedure. A simplex optimization was used to optimize the sequence-based terms for the goal of alignment and model quality and to balance the sequence and structural contributions against each other. Both sequence and structural terms operate with sequence profiles.
Resumo:
We have determined the crystal structure of HcRed, a far-red fluorescent protein isolated from Heteractis crispa, to 2.1 resolution. HcRed was observed to form a dimer, in contrast to the monomeric form of green fluorescent protein (GFP) or the tetrameric forms of the GFP-like proteins (eqFP611, Rtms5 and DsRed). Unlike the well-defined chromophore conformation observed in GFP and the GFP-like proteins, the HcRed chromophore was observed to be considerably mobile. Within the HcRed structure, the cyclic tripeptide chromophore, Glu64-Tyr65-Gly66, was observed to adopt both a cis coplanar and a tran. non-coplanar conformation. As a result of these two con formations, the hydroxyphenyl moiety of the chromophore makes distinct interactions within the interior of the b-can. These data together with a quantum chemical model of the chromophore, suggest the cis coplanar conformation to be consistent with the fluorescent properties of HcRed, and the trans non-coplanar conformation to be consistent with non-fluorescent properties of hcCP, the chromoprotein parent of HcRed. Moreover, within the GFP-like family, it appears that where conformational freedom is permissible then flexibility in the chromophore conformation is possible. 2005 Elsevier Ltd. All rights reserved.
Resumo:
In this study, we propose a novel method to predict the solvent accessible surface areas of transmembrane residues. For both transmembrane alpha-helix and beta-barrel residues, the correlation coefficients between the predicted and observed accessible surface areas are around 0.65. On the basis of predicted accessible surface areas, residues exposed to the lipid environment or buried inside a protein can be identified by using certain cutoff thresholds. We have extensively examined our approach based on different definitions of accessible surface areas and a variety of sets of control parameters. Given that experimentally determining the structures of membrane proteins is very difficult and membrane proteins are actually abundant in nature, our approach is useful for theoretically modeling membrane protein tertiary structures, particularly for modeling the assembly of transmembrane domains. This approach can be used to annotate the membrane proteins in proteomes to provide extra structural and functional information.
Resumo:
A new method has been developed for prediction of transmembrane helices using support vector machines. Different coding schemes of protein sequences were explored, and their performances were assessed by crossvalidation tests. The best performance method can predict the transmembrane helices with sensitivity of 93.4% and precision of 92.0%. For each predicted transmembrane segment, a score is given to show the strength of transmembrane signal and the prediction reliability. In particular, this method can distinguish transmembrane proteins from soluble proteins with an accuracy of similar to99%. This method can be used to complement current transmembrane helix prediction methods and can be Used for consensus analysis of entire proteomes . The predictor is located at http://genet.imb.uq.edu.au/predictors/ SVMtm. (C) 2004 Wiley Periodicals, Inc.
Resumo:
Potato type II serine proteinase inhibitors are proteins that consist of multiple sequence repeats, and exhibit a multidomain structure. The structural domains are circular permutations of the repeat sequence.. as a result or intramolecular domain swapping. Structural studies give indications for the origins of this folding behaviour, and the evolution of the inhibitor family.
Resumo:
The solution structure of one of the first members of the cyclotide family of macrocyclic peptides to be discovered, circulin B has been determined and compared with that of circulin A and related cyclotides. Cyclotides are mini-proteins derived from plants that have the characteristic features of a head-to-tail cyclised peptide backbone and a knotted arrangement of their three disulfide bonds. First discovered because of their uterotonic or anti-HIV activity, they have also been reported to have activity against a range of Gram positive and Gram negative bacteria as well as fungi. The aim of the current study was to develop structure-activity relationships to rationalise this antimicrobial activity. Comparison of cyclotide structures and activities suggests that the presence and location of cationic residues may be a requirement for activity against Gram negative bacteria. Understanding the topological differences associated with the antimicrobial activity of the cyclotides is of significant interest and potentially may be harnessed for pharmaceutical applications.
Resumo:
Ketol-acid reductoisomerase (KARI; EC 1.1.1.86) catalyzes two steps in the biosynthesis of branched-chain amino acids. Amino acid sequence comparisons across species reveal that there are two types of this enzyme: a short form (Class 1) found in fungi and most bacteria, and a long form (Class 11) typical of plants. Crystal structures of each have been reported previously. However, some bacteria such as Escherichia coli possess a long form, where the amino acid sequence differs appreciably from that found in plants. Here, we report the crystal structure of the E. coli enzyme at 2.6 A resolution, the first three-dimensional structure of any bacterial Class 11 KARI. The enzyme consists of two domains, one with mixed alpha/beta structure, which is similar to that found in other pyridine nucleotide-dependent dehydrogenases. The second domain is mainly alpha-helical and shows strong evidence of internal duplication. Comparison of the active sites between KARI of E. coli, Pseudomonas aeruginosa, and spinach shows that most residues occupy conserved positions in the active site. E. coli KARI was crystallized as a tetramer, the likely biologically active unit. This contrasts with P. aeruginosa KARI, which forms a dodecamer, and spinach KARI, a dimer. In the E. coli KARI tetramer, a novel subunit-to-subunit interacting surface is formed by a symmetrical pair of bulbous protrusions.