154 resultados para Protein structures
em National Center for Biotechnology Information - NCBI
Resumo:
A hierarchy of residue density assessments and packing properties in protein structures are contrasted, including a regular density, a variety of charge densities, a hydrophobic density, a polar density, and an aromatic density. These densities are investigated by alternative distance measures and also at the interface of multiunit structures. Amino acids are divided into nine structural categories according to three secondary structure states and three solvent accessibility levels. To take account of amino acid abundance differences across protein structures, we normalize the observed density by the expected density defining a density index. Solvent accessibility levels exert the predominant influence in determinations of the regular residue density. Explicitly, the regular density values vary approximately linearly with respect to solvent accessibility levels, the linearity parameters depending on the amino acid. The charge index reveals pronounced inequalities between lysine and arginine in their interactions with acidic residues. The aromatic density calculations in all structural categories parallel the regular density calculations, indicating that the aromatic residues are distributed as a random sample of all residues. Moreover, aromatic residues are found to be over-represented in the neighborhood of all amino acids. This result might be attributed to nucleation sites and protein stability being substantially associated with aromatic residues.
Resumo:
The residue environment in protein structures is studied with respect to the density of carbon (C), oxygen (O), and nitrogen (N) atoms within a certain distance (say 5 Å) of each residue. Two types of environments are evaluated: one based on side-chain atom contacts (abbreviated S-S) and the other based on all atom (side-chain + backbone) contacts (abbreviated A-A). Different atom counts are observed about nine-residue structural categories defined by three solvent accessibility levels and three secondary structure states. Among the structural categories, the S-S atom count ratios generally vary more than the A-A atom count ratios because of the fact that the backbone (O) and (N) atoms contribute equal counts. Secondary structure affects the (C) density for the A-A contacts whereas secondary structure has little influence on the (C) density for the S-S contacts. For S-S contacts, a greater density of (O) over (N) atom neighbors stands out in the environment of most amino acid types. By contrast, for A-A contacts, independent of the solvent accessibility levels, the ratio (O)/(N) is ≈1 in helical states, consistent with the geometry of α-helical residues whose side-chains tilt oppositely to the amino to carboxy α-helical axis. The highest ratio of neighbor (O)/(N) is achieved under solvent exposed conditions. This (O) vs. (N) prevalence is advantageous at the protein surface that generally exhibits an acid excess that helps to enhance protein solubility in the cell and to avoid nonspecific interactions with phosphate groups of DNA, RNA, and other plasma constituents.
Resumo:
The objectives of this and the following paper are to identify commonalities and disparities of the extended environment of mononuclear metal sites centering on Cu, Fe, Mn, and Zn. The extended environment of a metal site within a protein embodies at least three layers: the metal core, the ligand group, and the second shell, which is defined here to consist of all residues distant less than 3.5 Å from some ligand of the metal core. The ligands and second-shell residues can be characterized in terms of polarity, hydrophobicity, secondary structures, solvent accessibility, hydrogen-bonding interactions, and membership in statistically significant residue clusters of different kinds. Findings include the following: (i) Both histidine ligands of type I copper ions exclusively attach the Nδ1 nitrogen of the histidine imidazole ring to the metal, whereas histidine ligands for all mononuclear iron ions and nearly all type II copper ions are ligated via the Nɛ2 nitrogen. By contrast, multinuclear copper centers are coordinated predominantly by histidine Nɛ2, whereas diiron histidine contacts are predominantly Nδ1. Explanations in terms of steric differences between Nδ1 and Nɛ2 are considered. (ii) Except for blue copper (type I), the second-shell composition favors polar residues. (iii) For blue copper, the second shell generally contains multiple methionine residues, which are elements of a statistically significant histidine–cysteine–methionine cluster. Almost half of the second shell of blue copper consists of solvent-accessible residues, putatively facilitating electron transfer. (iv) Mononuclear copper atoms are never found with acidic carboxylate ligands, whereas single Mn2+ ion ligands are predominantly acidic and the second shell tends to be mostly buried. (v) The extended environment of mononuclear Fe sites often is associated with histidine–tyrosine or histidine–acidic clusters.
Resumo:
Our study of the extended metal environment, particularly of the second shell, focuses in this paper on zinc sites. Key findings include: (i) The second shell of mononuclear zinc centers is generally more polar than hydrophobic and prominently features charged residues engaged in an abundance of hydrogen bonding with histidine ligands. Histidine–acidic or histidine–tyrosine clusters commonly overlap the environment of zinc ions. (ii) Histidine tautomeric metal bonding patterns in ligating zinc ions are mixed. For example, carboxypeptidase A, thermolysin, and sonic hedgehog possess the same ligand group (two histidines, one unibidentate acidic ligand, and a bound water), but their histidine tautomeric geometries markedly differ such that the carboxypeptidase A makes only Nδ1 contacts, thermolysin makes only Nɛ2 contacts, and sonic hedgehog uses one of each. Thus the presence of a similar ligand cohort does not necessarily imply the same topology or function at the active site. (iii) Two close histidine ligands HXmH, m ≤ 5, rarely both coordinate a single metal ion in the Nδ1 tautomeric conformation, presumably to avoid steric conflicts. Mononuclear zinc sites can be classified into six types depending on the ligand composition and geometry. Implications of the results are discussed in terms of divergent and convergent evolution.
Resumo:
PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous superposition (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 ‘orphans’ (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The database with the web interfaced search and dendrogram generation tools can be accessed at http://pa uling.mbu.iisc.ernet.in/~pali.
Resumo:
It is generally accepted that globular proteins fold with a hydrophobic core and a hydrophilic exterior. Might the spatial distribution of amino acid hydrophobicity exhibit common features? The hydrophobic profile detailing this distribution from the protein interior to exterior has been examined for 30 relatively diverse structures obtained from the Protein Data Bank, for 3 proteins of the 30S ribosomal subunit, and for a simple set of 14 decoys. A second-order hydrophobic moment has provided a simple measure of the spatial variation. Shapes of the calculated spatial profiles of all native structures have been found to be comparable. Consequently, profile shapes as well as particular profile features should assist in validating predicted protein structures and in discriminating between different protein-folding pathways. The spatial profiles of the 14 decoys are clearly distinguished from the profiles of their native structures.
Resumo:
We present a method (ENERGI) for extracting energy-like quantities from a data base of protein structures. In this paper, we use the method to generate pairwise additive amino acid "energy" scores. These scores are obtained by iteration until they correctly discriminate a set of known protein folds from decoy conformations. The method succeeds in lattice model tests and in the gapless threading problem as defined by Maiorov and Crippen [Maiorov, V. N. & Crippen, G. M. (1992) J. Mol. Biol. 227, 876-888]. A more challenging test of threading a larger set of test proteins derived from the representative set of Hobohm and Sander [Hobohm, U. & Sander, C. (1994) Protein Sci. 3, 522-524] is used as a "workbench" for exploring how the ENERGI scores depend on their parameter sets.
Resumo:
Structurally neighboring residues are categorized according to their separation in the primary sequence as proximal (1-4 positions apart) and otherwise distal, which in turn is divided into near (5-20 positions), far (21-50 positions), very far ( > 50 positions), and interchain (from different chains of the same structure). These categories describe the linear distance histogram (LDH) for three-dimensional neighboring residue types. Among the main results are the following: (i) nearest-neighbor hydrophobic residues tend to be increasingly distally separated in the linear sequence, thus most often connecting distinct secondary structure units. (ii) The LDHs of oppositely charged nearest-neighbors emphasize proximal positions with a subsidiary maximum for very far positions. (iii) Cysteine-cysteine structural interactions rarely involve proximal positions. (iv) The greatest numbers of interchain specific nearest-neighbors in protein structures are composed of oppositely charged residues. (v) The largest fraction of side-chain neighboring residues from beta-strands involves near positions, emphasizing associations between consecutive strands. (vi) Exposed residue pairs are predominantly located in proximal linear positions, while buried residue pairs principally correspond to far or very far distal positions. The results are principally invariant to protein sizes, amino acid usages, linear distance normalizations, and over- and underrepresentations among nearest-neighbor types. Interpretations and hypotheses concerning the LDHs, particularly those of hydrophobic and charged pairings, are discussed with respect to protein stability and functionality. The pronounced occurrence of oppositely charged interchain contacts is consistent with many observations on protein complexes where multichain stabilization is facilitated by electrostatic interactions.
Resumo:
We present new methods for identifying and analyzing statistically significant residue clusters that occur in three-dimensional (3D) protein structures. Residue clusters of different kinds occur in many contexts. They often feature the active site (e.g., in substrate binding), the interface between polypeptide units of protein complexes, regions of protein-protein and protein-nucleic acid interactions, or regions of metal ion coordination. The methods are illustrated with 3D clusters centering on four themes. (i) Acidic or histidine-acidic clusters associated with metal ions. (ii) Cysteine clusters including coordination of metals such as zinc or iron-sulfur structures, cysteine knots prominent in growth factors, multiple sets of buried disulfide pairings that putatively nucleate the hydrophobic core, or cysteine clusters of mostly exposed disulfide bridges. (iii) Iron-sulfur proteins and charge clusters. (iv) 3D environments of multiple histidine residues. Study of diverse 3D residue clusters offers a new perspective on protein structure and function. The algorithms can aid in rapid identification of distinctive sites, suggest correlations among protein structures, and serve as a tool in the analysis of new structures.
Resumo:
Statistically significant charge clusters (basic, acidic, or of mixed charge) in tertiary protein structures are identified by new methods from a large representative collection of protein structures. About 10% of protein structures show at least one charge cluster, mostly of mixed type involving about equally anionic and cationic residues. Positive charge clusters are very rare. Negative (or histidine-acidic) charge clusters often coordinate calcium, or magnesium or zinc ions [e.g., thermolysin (PDB code: 3tln), mannose-binding protein (2msb), aminopeptidase (1amp)]. Mixed-charge clusters are prominent at interchain contacts where they stabilize quaternary protein formation [e.g., glutathione S-transferase (2gst), catalase (8act), and fructose-1,6-bisphosphate aldolase (1fba)]. They are also involved in protein-protein interaction and in substrate binding. For example, the mixed-charge cluster of aspartate carbamoyl-transferase (8atc) envelops the aspartate carbonyl substrate in a flexible manner (alternating tense and relaxed states) where charge associations can vary from weak to strong. Other proteins with charge clusters include the P450 cytochrome family (BM-3, Terp, Cam), several flavocytochromes, neuraminidase, hemagglutinin, the photosynthetic reaction center, and annexin. In each case in Table 2 we discuss the possible role of the charge clusters with respect to protein structure and function.
Resumo:
By using a protein-design algorithm that quantitatively considers side-chain packing, the effect of specific steric constraints on protein design was assessed in the core of the streptococcal protein G β1 domain. The strength of packing constraints used in the design was varied, resulting in core sequences that reflected differing amounts of packing specificity. The structural flexibility and stability of several of the designed proteins were experimentally determined and showed a trend from well-ordered to highly mobile structures as the degree of packing specificity in the design decreased. This trend both demonstrates that the inclusion of specific packing interactions is necessary for the design of native-like proteins and defines a useful range of packing specificity for the design algorithm. In addition, an analysis of the modeled protein structures suggested that penalizing for exposed hydrophobic surface area can improve design performance.
Resumo:
Structural genomics aims to solve a large number of protein structures that represent the protein space. Currently an exhaustive solution for all structures seems prohibitively expensive, so the challenge is to define a relatively small set of proteins with new, currently unknown folds. This paper presents a method that assigns each protein with a probability of having an unsolved fold. The method makes extensive use of protomap, a sequence-based classification, and scop, a structure-based classification. According to protomap, the protein space encodes the relationship among proteins as a graph whose vertices correspond to 13,354 clusters of proteins. A representative fold for a cluster with at least one solved protein is determined after superposition of all scop (release 1.37) folds onto protomap clusters. Distances within the protomap graph are computed from each representative fold to the neighboring folds. The distribution of these distances is used to create a statistical model for distances among those folds that are already known and those that have yet to be discovered. The distribution of distances for solved/unsolved proteins is significantly different. This difference makes it possible to use Bayes' rule to derive a statistical estimate that any protein has a yet undetermined fold. Proteins that score the highest probability to represent a new fold constitute the target list for structural determination. Our predicted probabilities for unsolved proteins correlate very well with the proportion of new folds among recently solved structures (new scop 1.39 records) that are disjoint from our original training set.
Resumo:
The conformational space annealing (CSA) method for global optimization has been applied to the 10-55 fragment of the B-domain of staphylococcal protein A (protein A) and to a 75-residue protein, apo calbindin D9K (PDB ID code 1CLB), by using the UNRES off-lattice united-residue force field. Although the potential was not calibrated with these two proteins, the native-like structures were found among the low-energy conformations, without the use of threading or secondary-structure predictions. This is because the CSA method can find many distinct families of low-energy conformations. Starting from random conformations, the CSA method found that there are two families of low-energy conformations for each of the two proteins, the native-like fold and its mirror image. The CSA method converged to the same low-energy folds in all cases studied, as opposed to other optimization methods. It appears that the CSA method with the UNRES force field, which is based on the thermodynamic hypothesis, can be used in prediction of protein structures in real time.
Resumo:
The database reported here is derived using the Combinatorial Extension (CE) algorithm which compares pairs of protein polypeptide chains and provides a list of structurally similar proteins along with their structure alignments. Using CE, structure–structure alignments can provide insights into biological function. When a protein of known function is shown to be structurally similar to a protein of unknown function, a relationship might be inferred; a relationship not necessarily detectable from sequence comparison alone. Establishing structure–structure relationships in this way is of great importance as we enter an era of structural genomics where there is a likelihood of an increasing number of structures with unknown functions being determined. Thus the CE database is an example of a useful tool in the annotation of protein structures of unknown function. Comparisons can be performed on the complete PDB or on a structurally representative subset of proteins. The source protein(s) can be from the PDB (updated monthly) or uploaded by the user. CE provides sequence alignments resulting from structural alignments and Cartesian coordinates for the aligned structures, which may be analyzed using the supplied Compare3D Java applet, or downloaded for further local analysis. Searches can be run from the CE web site, http://cl.sdsc.edu/ce.html, or the database and software downloaded from the site for local use.