71 resultados para T-Box Domain Proteins
em Indian Institute of Science - Bangalore - Índia
Resumo:
We explore the fuse of information on co-occurrence of domains in multi-domain proteins in predicting protein-protein interactions. The basic premise of our work is the assumption that domains co-occurring in a polypeptide chain undergo either structural or functional interactions among themselves. In this study we use a template dataset of domains in multidomain proteins and predict protein-protein interactions in a target organism. We note that maximum number of correct predictions of interacting protein domain families (158) is made in S. cerevisiae when the dataset of closely related organisms is used as the template followed by the more diverse dataset of bacterial proteins (48) and a dataset of randomly chosen proteins (23). We conclude that use of multi-domain information from organisms closely-related to the target can aid prediction of interacting protein families.
Resumo:
Multi-domain proteins have many advantages with respect to stability and folding inside cells. Here we attempt to understand the intricate relationship between the domain-domain interactions and the stability of domains in isolation. We provide quantitative treatment and proof for prevailing intuitive ideas on the strategies employed by nature to stabilize otherwise unstable domains. We find that domains incapable of independent stability are stabilized by favourable interactions with tethered domains in the multi-domain context. Stability of such folds to exist independently is optimized by evolution. Specific residue mutations in the sites equivalent to inter-domain interface enhance the overall solvation, thereby stabilizing these domain folds independently. A few naturally occurring variants at these sites alter communication between domains and affect stability leading to disease manifestation. Our analysis provides safe guidelines for mutagenesis which have attractive applications in obtaining stable fragments and domain constructs essential for structural studies by crystallography and NMR.
Resumo:
Genomic data of several organisms have revealed the presence of a vast repertoire of multi-domain proteins. The role played by individual domains in a multi-domain protein has a profound influence on the overall function of the protein. In the present analysis an attempt has been made to better understand the tethering preferences of domain families that occur in multi-domain proteins. The analysis has been carried out on an exhaustive dataset of 2 961 898 sequences of proteins from 930 organisms, where 741 274 proteins are comprised of at least two domain families. For every domain family, the number of other domain families with which it co-occurs within a protein in this dataset has been enumerated and is referred to as the tethering number of the domain family. It was found that, in the general dataset, the AAA ATPase family and the family of Ser/Thr kinases have the highest tethering numbers of 450 and 444 respectively. Further analysis reveals significant correlation between the number of members in a family and its tethering number. Positive correlation was also observed for the extent of a sequence and functional diversity within a family and the tethering numbers of domain families. Domain families that are present ubiquitously in diverse organisms tend to have large tethering numbers, while organism/kingdom-specific families have low tethering numbers. Thus, the analysis uncovers how domain families recombine and evolve to give rise to multi-domain proteins.
Resumo:
Inter-domain linkers (IDLs)' bridge flanking domains and support inter-domain communication in multi-domain proteins. Their sequence and conformational preferences enable them to carry out varied functions. They also provide sufficient flexibility to facilitate domain motions and, in conjunction with the interacting interfaces, they also regulate the inter-domain geometry (IDG). In spite of the basic intuitive understanding of the inter-domain orientations with respect to linker conformations and interfaces, we still do not entirely understand the precise relationship among the three. We show that IDG is evolutionarily well conserved and is constrained by the domain-domain interface interactions. The IDLs modulate the interactions by varying their lengths, conformations and local structure, thereby affecting the overall IDG. Results of our analysis provide guidelines in modelling of multi-domain proteins from the tertiary structures of constituent domain components.
Resumo:
Establishing functional relationships between multi-domain protein sequences is a non-trivial task. Traditionally, delineating functional assignment and relationships of proteins requires domain assignments as a prerequisite. This process is sensitive to alignment quality and domain definitions. In multi-domain proteins due to multiple reasons, the quality of alignments is poor. We report the correspondence between the classification of proteins represented as full-length gene products and their functions. Our approach differs fundamentally from traditional methods in not performing the classification at the level of domains. Our method is based on an alignment free local matching scores (LMS) computation at the amino-acid sequence level followed by hierarchical clustering. As there are no gold standards for full-length protein sequence classification, we resorted to Gene Ontology and domain-architecture based similarity measures to assess our classification. The final clusters obtained using LMS show high functional and domain architectural similarities. Comparison of the current method with alignment based approaches at both domain and full-length protein showed superiority of the LMS scores. Using this method we have recreated objective relationships among different protein kinase sub-families and also classified immunoglobulin containing proteins where sub-family definitions do not exist currently. This method can be applied to any set of protein sequences and hence will be instrumental in analysis of large numbers of full-length protein sequences.
Resumo:
With the preponderance of multidomain proteins in eukaryotic genomes, it is essential to recognize the constituent domains and their functions. Often function involves communications across the domain interfaces, and the knowledge of the interacting sites is essential to our understanding of the structure-function relationship. Using evolutionary information extracted from homologous domains in at least two diverse domain architectures (single and multidomain), we predict the interface residues corresponding to domains from the two-domain proteins. We also use information from the three-dimensional structures of individual domains of two-domain proteins to train naive Bayes classifier model to predict the interfacial residues. Our predictions are highly accurate (approximate to 85%) and specific (approximate to 95%) to the domain-domain interfaces. This method is specific to multidomain proteins which contain domains in at least more than one protein architectural context. Using predicted residues to constrain domain-domain interaction, rigid-body docking was able to provide us with accurate full-length protein structures with correct orientation of domains. We believe that these results can be of considerable interest toward rational protein and interaction design, apart from providing us with valuable information on the nature of interactions. Proteins 2014; 82:1219-1234. (c) 2013 Wiley Periodicals, Inc.
Resumo:
Background: The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better. Results: Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions. Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family. Conclusions: CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.
Resumo:
Depth measures the extent of atom/residue burial within a protein. It correlates with properties such as protein stability, hydrogen exchange rate, protein-protein interaction hot spots, post-translational modification sites and sequence variability. Our server, DEPTH, accurately computes depth and solvent-accessible surface area (SASA) values. We show that depth can be used to predict small molecule ligand binding cavities in proteins. Often, some of the residues lining a ligand binding cavity are both deep and solvent exposed. Using the depth-SASA pair values for a residue, its likelihood to form part of a small molecule binding cavity is estimated. The parameters of the method were calibrated over a training set of 900 high-resolution X-ray crystal structures of single-domain proteins bound to small molecules (molecular weight < 1.5 KDa). The prediction accuracy of DEPTH is comparable to that of other geometry-based prediction methods including LIGSITE, SURFNET and Pocket-Finder (all with Matthew's correlation coefficient of similar to 0.4) over a testing set of 225 single and multi-chain protein structures. Users have the option of tuning several parameters to detect cavities of different sizes, for example, geometrically flat binding sites. The input to the server is a protein 3D structure in PDB format. The users have the option of tuning the values of four parameters associated with the computation of residue depth and the prediction of binding cavities. The computed depths, SASA and binding cavity predictions are displayed in 2D plots and mapped onto 3D representations of the protein structure using Jmol. Links are provided to download the outputs. Our server is useful for all structural analysis based on residue depth and SASA, such as guiding site-directed mutagenesis experiments and small molecule docking exercises, in the context of protein functional annotation and drug discovery.
Resumo:
The three dimensional structure of a protein provides major insights into its function. Protein structure comparison has implications in functional and evolutionary studies. A structural alphabet (SA) is a library of local protein structure prototypes that can abstract every part of protein main chain conformation. Protein Blocks (PBS) is a widely used SA, composed of 16 prototypes, each representing a pentapeptide backbone conformation defined in terms of dihedral angles. Through this description, the 3D structural information can be translated into a 1D sequence of PBs. In a previous study, we have used this approach to compare protein structures encoded in terms of PBs. A classical sequence alignment procedure based on dynamic programming was used, with a dedicated PB Substitution Matrix (SM). PB-based pairwise structural alignment method gave an excellent performance, when compared to other established methods for mining. In this study, we have (i) refined the SMs and (ii) improved the Protein Block Alignment methodology (named as iPBA). The SM was normalized in regards to sequence and structural similarity. Alignment of protein structures often involves similar structural regions separated by dissimilar stretches. A dynamic programming algorithm that weighs these local similar stretches has been designed. Amino acid substitutions scores were also coupled linearly with the PB substitutions. iPBA improves (i) the mining efficiency rate by 6.8% and (ii) more than 82% of the alignments have a better quality. A higher efficiency in aligning multi-domain proteins could be also demonstrated. The quality of alignment is better than DALI and MUSTANG in 81.3% of the cases. Thus our study has resulted in an impressive improvement in the quality of protein structural alignment. (C) 2011 Elsevier Masson SAS. All rights reserved.
Resumo:
Of the similar to 4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for similar to 2877 ORFs, covering similar to 70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well.
Resumo:
The relative levels of different sigma factors dictate the expression profile of a bacterium. Extracytoplasmic function sigma factors synchronize the transcriptional profile with environmental conditions. The cellular concentration of free extracytoplasmic function sigma factors is regulated by the localization of this protein in a sigma/anti-sigma complex. Anti-sigma factors are multi-domain proteins with a receptor to sense environmental stimuli and a conserved anti-sigma domain (ASD) that binds a sigma factor. Here we describe the structure of Mycobacterium tuberculosis anti-sigma(D) (RsdA) in complex with the -35 promoter binding domain of sigma(D) (sigma(D)(4)). We note distinct conformational features that enable the release of sigma(D) by the selective proteolysis of the ASD in RsdA. The structural and biochemical features of the sigma(D)/RsdA complex provide a basis to reconcile diverse regulatory mechanisms that govern sigma/anti-sigma interactions despite high overall structural similarity. Multiple regulatory mechanisms embedded in an ASD scaffold thus provide an elegant route to rapidly re-engineer the expression profile of a bacterium in response to an environmental stimulus.
Resumo:
A fundamental question in protein folding is whether the coil to globule collapse transition occurs during the initial stages of folding (burst phase) or simultaneously with the protein folding transition. Single molecule fluorescence resonance energy transfer (FRET) and small-angle X-ray scattering (SAXS) experiments disagree on whether Protein L collapse transition occurs during the burst phase of folding. We study Protein L folding using a coarse-grained model and molecular dynamics simulations. The collapse transition in Protein L is found to be concomitant with the folding transition. In the burst phase of folding, we find that FRET experiments overestimate radius of gyration, R-g, of the protein due to the application of Gaussian polymer chain end-to-end distribution to extract R-g from the FRET efficiency. FRET experiments estimate approximate to 6 angstrom decrease in R-g when the actual decrease is approximate to 3 angstrom on guanidinium chloride denaturant dilution from 7.5 to 1 M, thereby suggesting pronounced compaction in the protein dimensions in the burst phase. The approximate to 3 angstrom decrease is close to the statistical uncertainties of the R-g data measured from SAXS experiments, which suggest no compaction, leading to a disagreement with the FRET experiments. The transition-state ensemble (TSE) structures in Protein L folding are globular and extensive in agreement with the Psi-analysis experiments. The results support the hypothesis that the TSE of single domain proteins depends on protein topology and is not stabilized by local interactions alone.
Resumo:
Guanylyl cyclases (GCs) are enzymes that generate cyclic GMP and regulate different physiologic and developmental processes in a number of organisms. GCs possess sequence similarity to class III adenylyl cyclases (ACs) and are present as either membrane-bound receptor GCs or cytosolic soluble GCs. We sought to determine the evolution of GCs using a large-scale bioinformatic analysis and found multiple lineage-specific expansions of GC genes in the genomes of many eukaryotes. Moreover, a few GC-like proteins were identified in prokaryotes, which come fused to a number of different domains, suggesting allosteric regulation of nucleotide cyclase activity Eukaryotic receptor GCs are associated with a kinase homology domain (KHD), and phylogenetic analysis of these proteins suggest coevolution of the KHD and the associated cyclase domain as well as a conservation of the sequence and the size of the linker region between the KHD and the associated cyclase domain. Finally, we also report the existence of mimiviral proteins that contain putative active kinase domains associated with a cyclase domain, which could suggest early evolution of the fusion of these two important domains involved in signa transduction.
Resumo:
Single-stranded DNA-binding proteins (SSB) play an important role in most aspects of DNA metabolism including DNA replication, repair, and recombination. We report here the identification and characterization of SSB proteins of Mycobacterium smegmatis and Mycobacterium tuberculosis. Sequence comparison of M. smegmatis SSB revealed that it is homologous to M. tuberculosis SSB, except for a small spacer connecting the larger amino-terminal domain with the extreme carboxyl-terminal tail. The purified SSB proteins of mycobacteria bound single-stranded DNA with high affinity, and the association and dissociation constants were similar to that of the prototype SSB. The proteolytic signatures of free and bound forms of SSB proteins disclosed that DNA binding was associated with structural changes at the carboxyl-terminal domain. Significantly, SSB proteins from mycobacteria displayed high affinity for cognate RecA, whereas Escherichia coli SSB did not under comparable experimental conditions. Accordingly, SSB and RecA were coimmunoprecipitated from cell lysates, further supporting an interaction between these proteins in vivo. The carboxyl-terminal domain of M. smegmatis SSB, which is not essential for interaction with ssDNA, is the site of binding of its cognate RecA. These studies provide the first evidence for stable association of eubacterial SSB proteins with their cognate RecA, suggesting that these two proteins might function together during DNA repair and/or recombination.
Resumo:
The HORMA domain (for Hop1p, Rev7p and MAD2) was discovered in three chromatin-associated proteins in the budding yeast Saccharomyces cerevisiae. This domain has also been found in proteins with similar functions in organisms including plants, animals and nematodes. The HORMA domain containing proteins are thought to function as adaptors for meiotic checkpoint protein signaling and in the regulation of meiotic recombination. Surprisingly, new work has disclosed completely unanticipated and diverse functions for the HORMA domain containing proteins. A. M. Villeneuve and colleagues (Schvarzstein et al., 2013) show that meiosis-specific HORMA domain containing proteins plays a vital role in preventing centriole disengagement during Caenorhabditis elegans spermatocyte meiosis. Another recent study reveals that S. cerevisiae Atg13 HORMA domain acts as a phosphorylation-dependent conformational switch in the cellular autophagic process. (C) 2014 Elsevier B.V. All rights reserved.