926 resultados para Protein Sequence
Resumo:
In recent years, identification of sequence patterns has been given immense importance to understand better their significance with respect to genomic organization and evolutionary processes. To this end, an algorithm has been derived to identify all similar sequence repeats present in a protein sequence. The proposed algorithm is useful to correlate the three-dimensional structure of various similar sequence repeats available in the Protein Data Bank against the same sequence repeats present in other databases like SWISS-PROT, PIR and Genome databases.
Resumo:
Background: Thermophilic proteins sustain themselves and function at higher temperatures. Despite their structural and functional similarities with their mesophilic homologues, they show enhanced stability. Various comparative studies at genomic, protein sequence and structure levels, and experimental works highlight the different factors and dominant interacting forces contributing to this increased stability. Methods: In this comparative structure based study, we have used interaction energies between amino acids, to generate structure networks called as Protein Energy Networks (PENs). These PENs are used to compute network, sub-graph, and node specific parameters. These parameters are then compared between the thermophile-mesophile homologues. Results: The results show an increased number of clusters and low energy cliques in thermophiles as the main contributing factors for their enhanced stability. Further more, we see an increase in the number of hubs in thermophiles. We also observe no community of electrostatic cliques forming in PENs. Conclusion: In this study we were able to take an energy based network approach, to identify the factors responsible for enhanced stability of thermophiles, by comparative analysis. We were able to point out that the sub-graph parameters are the prominent contributing factors. The thermophiles have a better-packed hydrophobic core. We have also discussed how thermophiles, although increasing stability through higher connectivity retains conformational flexibility, from a cliques and communities perspective.
Resumo:
Background: Thermophilic proteins sustain themselves and function at higher temperatures. Despite their structural and functional similarities with their mesophilic homologues, they show enhanced stability. Various comparative studies at genomic, protein sequence and structure levels, and experimental works highlight the different factors and dominant interacting forces contributing to this increased stability. Methods: In this comparative structure based study, we have used interaction energies between amino acids, to generate structure networks called as Protein Energy Networks (PENs). These PENs are used to compute network, sub-graph, and node specific parameters. These parameters are then compared between the thermophile-mesophile homologues. Results: The results show an increased number of clusters and low energy cliques in thermophiles as the main contributing factors for their enhanced stability. Further more, we see an increase in the number of hubs in thermophiles. We also observe no community of electrostatic cliques forming in PENs. Conclusion: In this study we were able to take an energy based network approach, to identify the factors responsible for enhanced stability of thermophiles, by comparative analysis. We were able to point out that the sub-graph parameters are the prominent contributing factors. The thermophiles have a better-packed hydrophobic core. We have also discussed how thermophiles, although increasing stability through higher connectivity retains conformational flexibility, from a cliques and communities perspective.
Resumo:
Background: The correlation of genetic distances between pairs of protein sequence alignments has been used to infer protein-protein interactions. It has been suggested that these correlations are based on the signal of co-evolution between interacting proteins. However, although mutations in different proteins associated with maintaining an interaction clearly occur (particularly in binding interfaces and neighbourhoods), many other factors contribute to correlated rates of sequence evolution. Proteins in the same genome are usually linked by shared evolutionary history and so it would be expected that there would be topological similarities in their phylogenetic trees, whether they are interacting or not. For this reason the underlying species tree is often corrected for. Moreover processes such as expression level, are known to effect evolutionary rates. However, it has been argued that the correlated rates of evolution used to predict protein interaction explicitly includes shared evolutionary history; here we test this hypothesis. Results: In order to identify the evolutionary mechanisms giving rise to the correlations between interaction proteins, we use phylogenetic methods to distinguish similarities in tree topologies from similarities in genetic distances. We use a range of datasets of interacting and non-interacting proteins from Saccharomyces cerevisiae. We find that the signal of correlated evolution between interacting proteins is predominantly a result of shared evolutionary rates, rather than similarities in tree topology, independent of evolutionary divergence. Conclusions: Since interacting proteins do not have tree topologies that are more similar than the control group of non-interacting proteins, it is likely that coevolution does not contribute much to, if any, of the observed correlations.
Resumo:
Protein structure space is believed to consist of a finite set of discrete folds, unlike the protein sequence space which is astronomically large, indicating that proteins from the available sequence space are likely to adopt one of the many folds already observed. In spite of extensive sequence-structure correlation data, protein structure prediction still remains an open question with researchers having tried different approaches (experimental as well as computational). One of the challenges of protein structure prediction is to identify the native protein structures from a milieu of decoys/models. In this work, a rigorous investigation of Protein Structure Networks (PSNs) has been performed to detect native structures from decoys/ models. Ninety four parameters obtained from network studies have been optimally combined with Support Vector Machines (SVM) to derive a general metric to distinguish decoys/models from the native protein structures with an accuracy of 94.11%. Recently, for the first time in the literature we had shown that PSN has the capability to distinguish native proteins from decoys. A major difference between the present work and the previous study is to explore the transition profiles at different strengths of non-covalent interactions and SVM has indeed identified this as an important parameter. Additionally, the SVM trained algorithm is also applied to the recent CASP10 predicted models. The novelty of the network approach is that it is based on general network properties of native protein structures and that a given model can be assessed independent of any reference structure. Thus, the approach presented in this paper can be valuable in validating the predicted structures. A web-server has been developed for this purpose and is freely available at http://vishgraph.mbu.iisc.ernet.in/GraProStr/PSN-QA.html.
Resumo:
Calcium plays a crucial role as a secondary messenger in all aspects of plant growth, development and survival. Calcium dependent protein kinases (CDPKs) are the major calcium decoders, which couple the changes in calcium level to an appropriate physiological response. The mechanism by which calcium regulates CDPK protein is not well understood. In this study, we investigated the interactions of Ca2+ ions with the CDPK1 isoform of Cicer arietinum (CaCDPK1) using a combination of biophysical tools. CaCDPK1 has four different EF hands as predicted by protein sequence analysis. The fluorescence emission spectrum of CaCDPK1 showed quenching with a 5 nm red shift upon addition of calcium, indicating conformational changes in the tertiary structure. The plot of changes in intensity against calcium concentrations showed a biphasic curve with binding constants of 1.29 mu M and 120 mu M indicating two kinds of binding sites. Isothermal calorimetric (ITC) titration with CaCl2 also showed a biphasic curve with two binding constants of 0.027 mu M and 1.7 mu M. Circular dichroism (CD) spectra showed two prominent peaks at 208 and 222 nm indicating that CaCDPK1 is a alpha-helical rich protein. Calcium binding further increased the alpha-helical content of CaCDPK1 from 75 to 81%. Addition of calcium to CaCDPK1 also increased fluorescence of 8-anilinonaphthalene-1-sulfonic acid (ANS) indicating exposure of hydrophobic surfaces. Thus, on the whole this study provides evidence for calcium induced conformational changes, exposure of hydrophobic surfaces and heterogeneity of EF hands in CaCDPK1. (C) 2015 Elsevier GmbH. All rights reserved.
Resumo:
Amino acid substitution matrices play an essential role in protein sequence alignment, a fundamental task in bioinformatics. Most widely used matrices, such as PAM matrices derived from homologous sequences and BLOSUM matrices derived from aligned segments of PROSITE, did not integrate conformation information in their construction. There are a few structure-based matrices, which are derived from limited data of structure alignment. Using databases PDB_SELECT and DSSP, we create a database of sequence-conformation blocks which explicitly represent sequence-structure relationship. Members in a block are identical in conformation and are highly similar in sequence. From this block database, we derive a conformation-specific amino acid substitution matrix CBSM60. The matrix shows an improved performance in conformational segment search and homolog detection.
Resumo:
A novel Ca^(2+)-binding protein with Mr of 23 K (designated p23) has been identified in avian erythrocytes and thrombocytes. p23 localizes to the marginal bands (MBs), centrosomes and discrete sites around the nuclear membrane in mature avian erythrocytes. p23 appears to bind Ca^(2+) directly and its interaction with subcellular organelles seems to be modulated by intracellular [Ca^(2+)]. However, its unique protein sequence lacks any known Ca^(2+)-binding motif. Developmental analysis reveals that p23 association to its target structures occurs only at very late stages of bone marrow definitive erythropoeisis. In primitive erythroid cells, p23 distributes diffusely in the cytoplasm and lacks any distinct localization. It is postulated that p23 association to subcellular structures may be induced in part by decreased intracellular [Ca^(2+)]. In vitro and in vivo experiments indicate that p23 does not appear to act as a classical microtubule-associated protein (MAP) but p23 homologues appear to be expressed in MB-containing cells of a variety of species from different vertebrate classes. It has been hypothesized that p23 may play a regulatory role in MB stabilization in a Ca^(2+)-dependent manner.
Binucleated (bnbn) turkey erythrocytes were found to express a truncated p23 variant (designated p21) with identical subcellular localization as p23 except immunostaining reveals the presence of multi-centrosomes in bnbn cells. The p21 sequence has a 62 amino acid deletion at the C-terminus and must therefore have an additional ~40 amino acids at the N-terminus. In addition, p21 seems to have lost the ability to bind Ca^(2+) and its supramolecular interactions are not modulated by intracellular [Ca^(2+)]. These apparent differences between p23 and p21 raised the possibility that the p23/p21 allelism could be the Bn/bn genotype. However, genetic analysis suggested that p23/p21 allelism had no absolute correlation with the Bn/bn genotype.
Resumo:
In protein sequence alignment, residue similarity is usually evaluated by substitution matrix, which scores all possible exchanges of one amino acid with another. Several matrices are widely used in sequence alignment, including PAM matrices derived from homologous sequence and BLOSUM matrices derived from aligned segments of BLOCKS. However, most matrices have not addressed the high-order residue-residue interactions that are vital to the bioproperties of protein.With consideration for the inherent correlation in residue triplet, we present a new scoring scheme for sequence alignment. Protein sequence is treated as overlapping and successive 3-residue segments. Two edge residues of a triplet are clustered into hydrophobic or polar categories, respectively. Protein sequence is then rewritten into triplet sequence with 2 · 20 · 2 = 80 alphabets. Using a traditional approach, we construct a new scoring scheme named TLESUMhp (TripLEt SUbstitution Matrices with hydropobic and polar information) for pairwise substitution of triplets, which characterizes the similarity of residue triplets. The applications of this matrix led to marked improvements in multiple sequence alignment and in searching structurally alike residue segments. The reason for the occurrence of the ‘‘twilight zone,’’ i.e., structure explosion of lowidentity sequences, is also discussed.
Resumo:
Proteins are essential components of cells and are crucial for catalyzing reactions, signaling, recognition, motility, recycling, and structural stability. This diversity of function suggests that nature is only scratching the surface of protein functional space. Protein function is determined by structure, which in turn is determined predominantly by amino acid sequence. Protein design aims to explore protein sequence and conformational space to design novel proteins with new or improved function. The vast number of possible protein sequences makes exploring the space a challenging problem.
Computational structure-based protein design (CSPD) allows for the rational design of proteins. Because of the large search space, CSPD methods must balance search accuracy and modeling simplifications. We have developed algorithms that allow for the accurate and efficient search of protein conformational space. Specifically, we focus on algorithms that maintain provability, account for protein flexibility, and use ensemble-based rankings. We present several novel algorithms for incorporating improved flexibility into CSPD with continuous rotamers. We applied these algorithms to two biomedically important design problems. We designed peptide inhibitors of the cystic fibrosis agonist CAL that were able to restore function of the vital cystic fibrosis protein CFTR. We also designed improved HIV antibodies and nanobodies to combat HIV infections.
Resumo:
The pKa values of ionizable groups in proteins report the free energy of site-specific proton binding and provide a direct means of studying pH-dependent stability. We measured histidine pKa values (H3, H22, and H105) in the unfolded (U), intermediate (I), and sulfate-bound folded (F) states of RNase P protein, using an efficient and accurate nuclear magnetic resonance-monitored titration approach that utilizes internal reference compounds and a parametric fitting method. The three histidines in the sulfate-bound folded protein have pKa values depressed by 0.21 ± 0.01, 0.49 ± 0.01, and 1.00 ± 0.01 units, respectively, relative to that of the model compound N-acetyl-l-histidine methylamide. In the unliganded and unfolded protein, the pKa values are depressed relative to that of the model compound by 0.73 ± 0.02, 0.45 ± 0.02, and 0.68 ± 0.02 units, respectively. Above pH 5.5, H22 displays a separate resonance, which we have assigned to I, whose apparent pKa value is depressed by 1.03 ± 0.25 units, which is ∼0.5 units more than in either U or F. The depressed pKa values we observe are consistent with repulsive interactions between protonated histidine side chains and the net positive charge of the protein. However, the pKa differences between F and U are small for all three histidines, and they have little ionic strength dependence in F. Taken together, these observations suggest that unfavorable electrostatics alone do not account for the fact that RNase P protein is intrinsically unfolded in the absence of ligand. Multiple factors encoded in the P protein sequence account for its IUP property, which may play an important role in its function.
Resumo:
Background: The amino terminal half of the cellular prion protein PrPc is implicated in both the binding of copper ions and the conformational changes that lead to disease but has no defined structure. However, as some structure is likely to exist we have investigated the use of an established protein refolding technology, fusion to green fluorescence protein (GFP), as a method to examine the refolding of the amino terminal domain of mouse prion protein. Results: Fusion proteins of PrPc and GFP were expressed at high level in E. coli and could be purified to near homogeneity as insoluble inclusion bodies. Following denaturation, proteins were diluted into a refolding buffer whereupon GFP fluorescence recovered with time. Using several truncations of PrPc the rate of refolding was shown to depend on the prion sequence expressed. In a variation of the format, direct observation in E. coli, mutations introduced randomly in the PrPc protein sequence that affected folding could be selected directly by recovery of GFP fluorescence. Conclusion: Use of GFP as a measure of refolding of PrPc fusion proteins in vitro and in vivo proved informative. Refolding in vitro suggested a local structure within the amino terminal domain while direct selection via fluorescence showed that as little as one amino acid change could significantly alter folding. These assay formats, not previously used to study PrP folding, may be generally useful for investigating PrPc structure and PrPc-ligand interaction.
Resumo:
We show that most isolates of influenza A induce filamentous changes in infected cells in contrast to A/WSN/33 and A/PR8/34 strains which have undergone extensive laboratory passage and are mouse-adapted. Using reverse genetics, we created recombinant viruses in the naturally filamentous genetic background of A/Victoria/3/75 and established that this property is regulated by the M1 protein sequence, but that the phenotype is complex and several residues are involved. The filamentous phenotype was lost when the amino acid at position 41 was switched from A to V, at the same time, this recombinant virus also became insensitive to the antibody 14C2. On the other hand, the filamentous phenotype could be fully transferred to a virus containing RNA segment 7 of the A/WSN/33 virus by a combination of three mutations in both the amino and carboxy regions of the M1 protein. This observation suggests that an interaction among these regions of M1 may occur during assembly. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Dynamically disordered regions appear to be relatively abundant in eukaryotic proteomes. The DISOPRED server allows users to submit a protein sequence, and returns a probability estimate of each residue in the sequence being disordered. The results are sent in both plain text and graphical formats, and the server can also supply predictions of secondary structure to provide further structural information.
Resumo:
The PSIPRED protein structure prediction server allows users to submit a protein sequence, perform a prediction of their choice and receive the results of the prediction both textually via e-mail and graphically via the web. The user may select one of three prediction methods to apply to their sequence: PSIPRED, a highly accurate secondary structure prediction method; MEMSAT 2, a new version of a widely used transmembrane topology prediction method; or GenTHREADER, a sequence profile based fold recognition method.