945 resultados para Protein structures
Resumo:
As the expression of the genetic blueprint, proteins are at the heart of all biological systems. The ever increasing set of available protein structures has taught us that diversity is the hallmark of their architecture, a fundamental characteristic that enables them to perform the vast array of functionality upon which all of life depends. This diversity, however, is central to one of the most challenging problems in molecular biology: how does a folding polypeptide chain navigate its way through all of the myriad of possible conformations to find its own particular biologically active form? With few overarching structural principles to draw upon that can be applied to all protein architecture, the search for a solution to the protein folding problem has yet to produce an algorithm that can explain and duplicate this fundamental biological process. In this thesis, we take a two-pronged approach for investigating the protein folding process. Our initial statistical studies of the distributions of hydrophobic and hydrophilic residues within α-helices and β-sheets suggest (i) that hydrophobicity plays a critical role in helix and sheet formation; and (ii) that the nucleation of these motifs may result in largely unidirectional growth. Most tellingly, from an examination of the amino acids found in the smallest β-sheets, we do not find any evidence of a β-nucleating code in the primary protein sequence. Complementing these statistical analyses, we have analyzed the structural environments of several ever-widening aspects of protein topology. Our examination of the gaps between strands in the smallest β-sheets reveals a common organizational principle underlying β-formation involving strands separated by large sequential gaps: with very few exceptions, these large gaps fold into single, compact structural modules, bringing the β-strands that are otherwise far apart in the sequence close together in space. We conclude, therefore, that β-nucleation in the smallest sheets results from the co-location of two strands that are either local in sequence, or local in space following prior folding events. A second study of larger β-sheets both corroborates and extends these findings: virtually all large sequential gaps between pairs of β-strands organize themselves into an hierarchical arrangement, creating a bread-crumb model of go-and-come-back structural organization that ultimately juxtaposes two strands of a parental β-structure that are far apart in the sequence in close spatial proximity. In a final study, we have formalized this go-and-come-back notion into the concept of anti-parallel double-strandedness (DS), and measure this property across protein architecture in general. With over 90% of all residues in a large, non-redundant set of protein structures classified as DS, we conclude that DS is a unifying structural principle that underpins all globular proteins. We postulate, moreover, that this one simple principle, anti-parallel double-strandedness, unites protein structure, protein folding and protein evolution.
Resumo:
The results of applying a fragment-based protein tertiary structure prediction method to the prediction of 14 CASP5 target domains are described. The method is based on the assembly of supersecondary structural fragments taken from highly resolved protein structures using a simulated annealing algorithm. A number of good predictions for proteins with novel folds were produced, although not always as the first model. For two fold recognition targets, FRAGFOLD produced the most accurate model in both cases, despite the fact that the predictions were not based on a template structure. Although clear progress has been made in improving FRAGFOLD since CASP4, the ranking of final models still seems to be the main problem that needs to be addressed before the next CASP experiment
Resumo:
The estimation of prediction quality is important because without quality measures, it is difficult to determine the usefulness of a prediction. Currently, methods for ligand binding site residue predictions are assessed in the function prediction category of the biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, utilizing the Matthews Correlation Coefficient (MCC) and Binding-site Distance Test (BDT) metrics. However, the assessment of ligand binding site predictions using such metrics requires the availability of solved structures with bound ligands. Thus, we have developed a ligand binding site quality assessment tool, FunFOLDQA, which utilizes protein feature analysis to predict ligand binding site quality prior to the experimental solution of the protein structures and their ligand interactions. The FunFOLDQA feature scores were combined using: simple linear combinations, multiple linear regression and a neural network. The neural network produced significantly better results for correlations to both the MCC and BDT scores, according to Kendall’s τ, Spearman’s ρ and Pearson’s r correlation coefficients, when tested on both the CASP8 and CASP9 datasets. The neural network also produced the largest Area Under the Curve score (AUC) when Receiver Operator Characteristic (ROC) analysis was undertaken for the CASP8 dataset. Furthermore, the FunFOLDQA algorithm incorporating the neural network, is shown to add value to FunFOLD, when both methods are employed in combination. This results in a statistically significant improvement over all of the best server methods, the FunFOLD method (6.43%), and one of the top manual groups (FN293) tested on the CASP8 dataset. The FunFOLDQA method was also found to be competitive with the top server methods when tested on the CASP9 dataset. To the best of our knowledge, FunFOLDQA is the first attempt to develop a method that can be used to assess ligand binding site prediction quality, in the absence of experimental data.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
A scheme is presented in which an organic solvent environment in combination with surfactants is used to confine a natively unfolded protein inside an inverse microemulsion droplet. This type of confinement allows a study that provides unique insight into the dynamic structure of an unfolded, flexible protein which is still solvated and thus under near-physiological conditions. In a model system, the protein osteopontin (OPN) is used. It is a highly phosphorylated glycoprotein that is expressed in a wide range of cells and tissues for which limited structural analysis exists due to the high degree of flexibility and large number of post-translational modifications. OPN is implicated in tissue functions, such as inflammation and mineralisation. It also has a key function in tumour metastasis and progression. Circular dichroism measurements show that confinement enhances the secondary structural features of the protein. Small-angle X-ray scattering and dynamic light scattering show that OPN changes from being a flexible protein in aqueous solution to adopting a less flexible and more compact structure inside the microemulsion droplets. This novel approach for confining proteins while they are still hydrated may aid in studying the structure of a wide range of natively unfolded proteins.
Resumo:
The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.
Resumo:
By using a protein-design algorithm that quantitatively considers side-chain packing, the effect of specific steric constraints on protein design was assessed in the core of the streptococcal protein G β1 domain. The strength of packing constraints used in the design was varied, resulting in core sequences that reflected differing amounts of packing specificity. The structural flexibility and stability of several of the designed proteins were experimentally determined and showed a trend from well-ordered to highly mobile structures as the degree of packing specificity in the design decreased. This trend both demonstrates that the inclusion of specific packing interactions is necessary for the design of native-like proteins and defines a useful range of packing specificity for the design algorithm. In addition, an analysis of the modeled protein structures suggested that penalizing for exposed hydrophobic surface area can improve design performance.
Resumo:
Structural genomics aims to solve a large number of protein structures that represent the protein space. Currently an exhaustive solution for all structures seems prohibitively expensive, so the challenge is to define a relatively small set of proteins with new, currently unknown folds. This paper presents a method that assigns each protein with a probability of having an unsolved fold. The method makes extensive use of protomap, a sequence-based classification, and scop, a structure-based classification. According to protomap, the protein space encodes the relationship among proteins as a graph whose vertices correspond to 13,354 clusters of proteins. A representative fold for a cluster with at least one solved protein is determined after superposition of all scop (release 1.37) folds onto protomap clusters. Distances within the protomap graph are computed from each representative fold to the neighboring folds. The distribution of these distances is used to create a statistical model for distances among those folds that are already known and those that have yet to be discovered. The distribution of distances for solved/unsolved proteins is significantly different. This difference makes it possible to use Bayes' rule to derive a statistical estimate that any protein has a yet undetermined fold. Proteins that score the highest probability to represent a new fold constitute the target list for structural determination. Our predicted probabilities for unsolved proteins correlate very well with the proportion of new folds among recently solved structures (new scop 1.39 records) that are disjoint from our original training set.
Resumo:
The conformational space annealing (CSA) method for global optimization has been applied to the 10-55 fragment of the B-domain of staphylococcal protein A (protein A) and to a 75-residue protein, apo calbindin D9K (PDB ID code 1CLB), by using the UNRES off-lattice united-residue force field. Although the potential was not calibrated with these two proteins, the native-like structures were found among the low-energy conformations, without the use of threading or secondary-structure predictions. This is because the CSA method can find many distinct families of low-energy conformations. Starting from random conformations, the CSA method found that there are two families of low-energy conformations for each of the two proteins, the native-like fold and its mirror image. The CSA method converged to the same low-energy folds in all cases studied, as opposed to other optimization methods. It appears that the CSA method with the UNRES force field, which is based on the thermodynamic hypothesis, can be used in prediction of protein structures in real time.
Resumo:
The database reported here is derived using the Combinatorial Extension (CE) algorithm which compares pairs of protein polypeptide chains and provides a list of structurally similar proteins along with their structure alignments. Using CE, structure–structure alignments can provide insights into biological function. When a protein of known function is shown to be structurally similar to a protein of unknown function, a relationship might be inferred; a relationship not necessarily detectable from sequence comparison alone. Establishing structure–structure relationships in this way is of great importance as we enter an era of structural genomics where there is a likelihood of an increasing number of structures with unknown functions being determined. Thus the CE database is an example of a useful tool in the annotation of protein structures of unknown function. Comparisons can be performed on the complete PDB or on a structurally representative subset of proteins. The source protein(s) can be from the PDB (updated monthly) or uploaded by the user. CE provides sequence alignments resulting from structural alignments and Cartesian coordinates for the aligned structures, which may be analyzed using the supplied Compare3D Java applet, or downloaded for further local analysis. Searches can be run from the CE web site, http://cl.sdsc.edu/ce.html, or the database and software downloaded from the site for local use.
Resumo:
Isolated immature maize (Zea mays L.) embryos have been shown to acquire tolerance to rapid drying between 22 and 25 d after pollination (DAP) and to slow drying from 18 DAP onward. To investigate adaptations in protein profile in association with the acquisition of desiccation tolerance in isolated, immature maize embryos, we applied in situ Fourier transform infrared microspectroscopy. In fresh, viable, 20- and 25-DAP embryo axes, the shapes of the different amide-I bands were identical, and this was maintained after flash drying. On rapid drying, the 20-DAP axes had a reduced relative proportion of α-helical protein structure and lost viability. Rapidly dried 25-DAP embryos germinated (74%) and had a protein profile similar to the fresh control axes. On slow drying, the α-helical contribution in both the 20- and 25-DAP embryo axes increased compared with that in the fresh control axes, and survival of desiccation was high. The protein profile in dry, mature axes resembled that after slow drying of the immature axes. Rapid drying resulted in an almost complete loss of membrane integrity in the 20-DAP embryo axes and much less so in the 25-DAP axes. After slow drying, low plasma membrane permeability ensued in both the 20- and 25-DAP axes. We conclude that slow drying of excised, immature embryos leads to an increased proportion of α-helical protein structures in their axes, which coincides with additional tolerance of desiccation stress.
Resumo:
A diverse group of GPI-anchored protein structures are ubiquitously expressed on the external cell membranes of eukaryotes. Whereas the physiological role for these structures is usually defined by their protein component, the precise biological significance of the glycolipid anchors remains vague. In the course of producing a HeLa cell line (JM88) that contained a recombinant adeno-associated virus genome expressing a GPI-anchored CD4-GPI fusion protein on the surface of the cells, we noted the transfer of CD4-GPI to native HeLa cells. Transfer occurred after direct cell contact or exposure to JM88 cell supernatants. The magnitude of contact-mediated CD4-GPI transfer correlated with temperature. Supernatant CD4-GPI also attached to human red blood cells and could be cleaved with phosphatidylinositol-specific phospholipase C. The attached CD4-GPI remained biologically active after transfer and permitted the formation of syncytium when coated HeLa cells were incubated with glycoprotein 160 expressing H9 cells. JM88 cells provide a model for the production, release, and reattachment of CD4-GPI and may furnish insight into a physiologic role of naturally occurring GPI-anchored proteins. This approach may also allow the production of other recombinant GPI-anchored proteins for laboratory and clinical investigation.
Resumo:
Barnase and barstar are trivial names of the extracellular RNase and its intracellular inhibitor produced by Bacillus amyloliquefaciens. Inhibition involves the formation of a very tight one-to-one complex of the two proteins. With the crystallographic solution of the structure of the barnase-barstar complex and the development of methods for measuring the free energy of binding, the pair can be used to study protein-protein recognition in detail. In this report, we describe the isolation of suppressor mutations in barstar that compensate for the loss in interaction energy caused by a mutation in barnase. Our suppressor search is based on in vivo selection for barstar variants that are able to protect host cells against the RNAse activity of those barnase mutants not properly inhibited by wild-type barstar. This approach utilizes a plasmid system in which barnase expression is tightly controlled to keep the mutant barnase gene silent. When expression of barnase is turned on, failure to form a complex between the mutant barnase and barstar has a lethal effect on host cells unless overcome by substitution of the wild-type barstar by a functional suppressor derivative. A set of barstar suppressors has been identified for barnase mutants with substitutions in two amino acid positions (residues 102 and 59), which are critically involved in both RNase activity and barstar binding. The mutations selected as suppressors could not have been predicted on the basis of the known protein structures. The single barstar mutation with the highest information content for inhibition of barnase (H102K) has the substitution Y30W. The reduction in binding caused by the R59E mutation in barnase can be partly reversed by changing Glu-76 of barstar, which forms a salt bridge with the Arg-59 in the wild-type complex, to arginine, thus completing an interchange of the two charges.
Resumo:
The Schizosaccharomyces pombe cell cycle-regulatory protein suc1, named as the suppressor of cdc2 temperature-sensitive mutations, is essential for cell cycle progression. To understand suc1 structure-function relationships and to help resolve conflicting interpretations of suc1 function based on genetic studies of suc1 and its functional homologs in both lower and higher eukaryotes, we have determined the crystal structure of the beta-interchanged suc1 dimer. Each domain consists of three alpha-helices and a four-stranded beta-sheet, completed by the interchange of terminal beta-strands between the two subunits. This beta-interchanged suc1 dimer, when compared with the beta-hairpin single-domain folds of suc1, reveals a beta-hinge motif formed by the conserved amino acid sequence HVPEPH. This beta-hinge mediates the subunit conformation and assembly of suc1: closing produces the intrasubunit beta-hairpin and single-domain fold, whereas opening leads to the intersubunit beta-strand interchange and interlocked dimer assembly reported here. This conformational switch markedly changes the surface accessibility of sequence-conserved residues available for recognition of cyclin-dependent kinase, suggesting a structural mechanism for beta-hinge-mediated regulation of suc1 biological function. Thus, suc1 belongs to the family of domain-swapping proteins, consisting of intertwined and dimeric protein structures in which the dual assembly modes regulate their function.
Resumo:
An immunoglobulin light chain protein was isolated from the urine of an individual (BRE) with systemic amyloidosis. Complete amino acid sequence of the variable region of the light chain (VL) protein established it as a kappa I, which when compared with other kappa I amyloid associated proteins had unique residues, including Ile-34, Leu-40, and Tyr-71. To study the tertiary structure, BRE VL was expressed in Escherichia coli by using a PCR product amplified from the patient BRE's bone marrow DNA. The PCR product was ligated into pCZ11, a thermal-inducible replication vector. Recombinant BRE VL was isolated, purified to homogeneity, and crystallized by using ammonium sulfate as the precipitant. Two crystal forms were obtained. In crystal form I the BRE VL kappa domain crystallizes as a dimer with unit cell constants isomorphous to previously published kappa protein structures. Comparison with a nonamyloid VL kappa domain from patient REI, identified significant differences in position of residues in the hypervariable segments plus variations in framework region (FR) segments 40-46 (FR2) and 66-67 (FR3). In addition, positional differences can be seen along the two types of local diads, corresponding to the monomer-monomer and dimer-dimer interfaces. From the packing diagram, a model for the amyloid light chain (AL) fibril is proposed based on a pseudohexagonal spiral structure with a rise of approximately the width of two dimers per 360 degree turn. This spiral structure could be consistent with the dimensions of amyloid fibrils as determined by electron microscopy.