972 resultados para Loop structure prediction
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.
Resumo:
The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding. We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable. Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.
Resumo:
One of the most important challenges in chemistry and material science is the connection between the contents of a compound and its chemical and physical properties. In solids, these are greatly influenced by the crystal structure.rnrnThe prediction of hitherto unknown crystal structures with regard to external conditions like pressure and temperature is therefore one of the most important goals to achieve in theoretical chemistry. The stable structure of a compound is the global minimum of the potential energy surface, which is the high dimensional representation of the enthalpy of the investigated system with respect to its structural parameters. The fact that the complexity of the problem grows exponentially with the system size is the reason why it can only be solved via heuristic strategies.rnrnImprovements to the artificial bee colony method, where the local exploration of the potential energy surface is done by a high number of independent walkers, are developed and implemented. This results in an improved communication scheme between these walkers. This directs the search towards the most promising areas of the potential energy surface.rnrnThe minima hopping method uses short molecular dynamics simulations at elevated temperatures to direct the structure search from one local minimum of the potential energy surface to the next. A modification, where the local information around each minimum is extracted and used in an optimization of the search direction, is developed and implemented. Our method uses this local information to increase the probability of finding new, lower local minima. This leads to an enhanced performance in the global optimization algorithm.rnrnHydrogen is a highly relevant system, due to the possibility of finding a metallic phase and even superconductor with a high critical temperature. An application of a structure prediction method on SiH12 finds stable crystal structures in this material. Additionally, it becomes metallic at relatively low pressures.
Resumo:
Metazoan replication-dependent histone mRNAs do not have a poly(A) tail but end instead in a conserved stem-loop structure. Efficient translation of these mRNAs is dependent on the stem-loop binding protein (SLBP). Here we explore the mechanism by which SLBP stimulates translation in vertebrate cells, using the tethered function assay and analyzing protein-protein interactions. We show for the first time that translational stimulation by SLBP increases during oocyte maturation and that SLBP stimulates translation at the level of initiation. We demonstrate that SLBP can interact directly with subunit h of eIF3 and with Paip1; however, neither of these interactions is sufficient to mediate its effects on translation. We find that Xenopus SLBP1 functions primarily at an early stage in the cap-dependent initiation pathway, targeting small ribosomal subunit recruitment. Analysis of IRES-driven translation in Xenopus oocytes suggests that SLBP activity requires eIF4E. We propose a model in which a novel factor contacts eIF4E bound to the 5' cap and SLBP bound to the 3' end simultaneously, mediating formation of an alternative end-to-end complex.
Crystal structure of 2,5-diketo-d-gluconic acid reductase A complexed with NADPH at 2.1-Å resolution
Resumo:
The three-dimensional structure of Corynebacterium 2,5-diketo-d-gluconic acid reductase A (2,5-DKGR A; EC 1.1.1.-), in complex with cofactor NADPH, has been solved by using x-ray crystallographic data to 2.1-Å resolution. This enzyme catalyzes stereospecific reduction of 2,5-diketo-d-gluconate (2,5-DKG) to 2-keto-l-gulonate. Thus the three-dimensional structure has now been solved for a prokaryotic example of the aldo–keto reductase superfamily. The details of the binding of the NADPH cofactor help to explain why 2,5-DKGR exhibits lower binding affinity for cofactor than the related human aldose reductase does. Furthermore, changes in the local loop structure near the cofactor suggest that 2,5-DKGR will not exhibit the biphasic cofactor binding characteristics observed in aldose reductase. Although the crystal structure does not include substrate, the two ordered water molecules present within the substrate-binding pocket are postulated to provide positional landmarks for the substrate 5-keto and 4-hydroxyl groups. The structural basis for several previously described active-site mutants of 2,5-DKGR A is also proposed. Recent research efforts have described a novel approach to the synthesis of l-ascorbate (vitamin C) by using a genetically engineered microorganism that is capable of synthesizing 2,5-DKG from glucose and subsequently is transformed with the gene for 2,5-DKGR. These modifications create a microorganism capable of direct production of 2-keto-l-gulonate from d-glucose, and the gulonate can subsequently be converted into vitamin C. In economic terms, vitamin C is the single most important specialty chemical manufactured in the world. Understanding the structural determinants of specificity, catalysis, and stability for 2,5-DKGR A is of substantial commercial interest.
Resumo:
Local protein structure prediction efforts have consistently failed to exceed approximately 70% accuracy. We characterize the degeneracy of the mapping from local sequence to local structure responsible for this failure by investigating the extent to which similar sequence segments found in different proteins adopt similar three-dimensional structures. Sequence segments 3-15 residues in length from 154 different protein families are partitioned into neighborhoods containing segments with similar sequences using cluster analysis. The consistency of the sequence-to-structure mapping is assessed by comparing the local structures adopted by sequence segments in the same neighborhood in proteins of known structure. In the 154 families, 45% and 28% of the positions occur in neighborhoods in which one and two local structures predominate, respectively. The sequence patterns that characterize the neighborhoods in the first class probably include virtually all of the short sequence motifs in proteins that consistently occur in a particular local structure. These patterns, many of which occur in transitions between secondary structural elements, are an interesting combination of previously studied and novel motifs. The identification of sequence patterns that consistently occur in one or a small number of local structures in proteins should contribute to the prediction of protein structure from sequence.
Resumo:
In PCR, DNA polymerases from thermophilic bacteria catalyze the extension of primers annealed to templates as well as the structure-specific cleavage of the products of primer extension. Here we show that cleavage by Thermus aquaticus and Thermus thermophilus DNA polymerases can be precise and substantial: it occurs at the base of the stem-loop structure assumed by the single strand products of primer extension using as template a common genetic element, the promoter-operator of the Escherichia coli lactose operon, and may involve up to 30% of the products. The cleavage is independent of primer, template, and triphosphates, is dependent on substrate length and temperature, requires free ends and Mg2+, and is absent in DNA polymerases lacking the 5'-->3' exonuclease, such as the Stoffel fragment and the T7 DNA polymerase. Heterogeneity of the extension products results also from premature detachment of the enzyme approaching the 5' end of the template.
Resumo:
Linkage disequilibrium between polymorphisms in a natural population may result from various evolutionary forces, including random genetic drift due to sampling of gametes during reproduction, restricted migration between subpopulations in a subdivided population, or epistatic selection. In this report, we present evidence that the majority of significant linkage disequilibria observed in introns of the alcohol dehydrogenase locus (Adh) of Drosophila pseudoobscura are due to epistatic selection maintaining secondary structure of precursor mRNA (pre-mRNA). Based on phylogenetic-comparative analysis and a likelihood approach, we propose secondary structure models of Adh pre-mRNA for the regions of the adult intron and intron 2 where clustering of linkage disequilibria has been observed. Furthermore, we applied the likelihood ratio test to the phylogenetically predicted secondary structure in intron 1. In contrast to the other two structures, polymorphisms associated with the more conserved stem-loop structure of intron 1 are in low frequency, and linkage disequilibria have not been observed. These findings are qualitatively consistent with a model of compensatory fitness interactions. This model assumes that mutations disrupting pairing in a secondary structural element are individually deleterious if they destabilize a functionally important structure; a second "compensatory" mutation, however, may restabilize the structure and restore fitness.
Resumo:
We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two windows of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. (C) 2004 Wiley-Liss, Inc.
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.
Resumo:
Hydrophobins are small (similar to 100 aa) proteins that have an important role in the growth and development of mycelial fungi. They are surface active and, after secretion by the fungi, self-assemble into amphipathic membranes at hydrophobic/hydrophilic interfaces, reversing the hydrophobicity of the surface. In this study, molecular dynamics simulation techniques have been used to model the process by which a specific class I hydrophobin, SC3, binds to a range of hydrophobic/ hydrophilic interfaces. The structure of SC3 used in this investigation was modeled based on the crystal structure of the class II hydrophobin HFBII using the assumption that the disulfide pairings of the eight conserved cysteine residues are maintained. The proposed model for SC3 in aqueous solution is compact and globular containing primarily P-strand and coil structures. The behavior of this model of SC3 was investigated at an air/water, an oil/water, and a hydrophobic solid/water interface. It was found that SC3 preferentially binds to the interfaces via the loop region between the third and fourth cysteine residues and that binding is associated with an increase in a-helix formation in qualitative agreement with experiment. Based on a combination of the available experiment data and the current simulation studies, we propose a possible model for SC3 self-assembly on a hydrophobic solid/water interface.
Resumo:
Protein structure prediction is a cornerstone of bioinformatics research. Membrane proteins require their own prediction methods due to their intrinsically different composition. A variety of tools exist for topology prediction of membrane proteins, many of them available on the Internet. The server described in this paper, BPROMPT (Bayesian PRediction Of Membrane Protein Topology), uses a Bayesian Belief Network to combine the results of other prediction methods, providing a more accurate consensus prediction. Topology predictions with accuracies of 70% for prokaryotes and 53% for eukaryotes were achieved. BPROMPT can be accessed at http://www.jenner.ac.uk/BPROMPT.
Resumo:
Adenine phosphoribosyltransferase (APRT) is an important enzyme component of the purine recycling pathway. Parasitic protozoa of the order Kinetoplastida are unable to synthesize purines de novo and use the salvage pathway for the synthesis of purine bases rendering this biosynthetic pathway an attractive target for antiparasitic drug design. The recombinant human adenine phosphoribosyltransferase (hAPRT) structure was resolved in the presence of AMP in the active site to 1.76 angstrom resolution and with the substrates PRPP and adenine simultaneously bound to the catalytic site to 1.83 angstrom resolution. An additional structure was solved containing one subunit of the dimer in the apo-form to 2.10 angstrom resolution. Comparisons of these three hAPRT structures with other `type I` PRTases revealed several important features of this class of enzymes. Our data indicate that the flexible loop structure adopts an open conformation before and after binding of both substrates adenine and PRPR Comparative analyses presented here provide structural evidence to propose the role of Glu 104 as the residue that abstracts the proton of adenine N9 atom before its nucleophilic attack on the PRPP anomeric carbon. This work leads to new insights to the understanding of the APRT catalytic mechanism.
Resumo:
Phospholipases A(2) (PLA(2)) are enzymes commonly found in snake venoms from Viperidae and Elaphidae families, which are major components thereof. Many plants are used in traditional medicine its active agents against various effects induced by snakebite. This article presents the PLA(2) BthTX-I structure prediction based on homology modeling. In addition, we have performed virtual screening in a large database yielding a set of potential bioactive inhibitors. A flexible docking program was used to investigate the interactions between the receptor and the new ligands. We have performed molecular interaction fields (MIFs) calculations with the phospholipase model. Results confirm the important role of Lys49 for binding ligands and suggest three additional residues as well. We have proposed a theoretically nontoxic, drug-like, and potential novel BthTX-I inhibitor. These calculations have been used to guide the design of novel phospholipase inhibitors as potential lead compounds that may be optimized for future treatment of snakebite victims as well as other human diseases in which PLA(2) enzymes are involved.