201 resultados para sequence similarity searches
em Indian Institute of Science - Bangalore - Índia
Resumo:
Structure comparison tools can be used to align related protein structures to identify structurally conserved and variable regions and to infer functional and evolutionary relationships. While the conserved regions often superimpose well, the variable regions appear non superimposable. Differences in homologous protein structures are thought to be due to evolutionary plasticity to accommodate diverged sequences during evolution. One of the kinds of differences between 3-D structures of homologous proteins is rigid body displacement. A glaring example is not well superimposed equivalent regions of homologous proteins corresponding to a-helical conformation with different spatial orientations. In a rigid body superimposition, these regions would appear variable although they may contain local similarity. Also, due to high spatial deviation in the variable region, one-to-one correspondence at the residue level cannot be determined accurately. Another kind of difference is conformational variability and the most common example is topologically equivalent loops of two homologues but with different conformations. In the current study, we present a refined view of the ``structurally variable'' regions which may contain local similarity obscured in global alignment of homologous protein structures. As structural alphabet is able to describe local structures of proteins precisely through Protein Blocks approach, conformational similarity has been identified in a substantial number of `variable' regions in a large data set of protein structural alignments; optimal residue-residue equivalences could be achieved on the basis of Protein Blocks which led to improved local alignments. Also, through an example, we have demonstrated how the additional information on local backbone structures through protein blocks can aid in comparative modeling of a loop region. In addition, understanding on sequence-structure relationships can be enhanced through our approach. This has been illustrated through examples where the equivalent regions in homologous protein structures share sequence similarity to varied extent but do not preserve local structure.
Resumo:
The Basic Local Alignment Search Tool (BLAST) is one of the most widely used sequence alignment programs with which similarity searches, for both protein and nucleic acid sequences, can be performed against large databases at high speed. A large number of tools exist for processing BLAST output, but none of them provide three-dimensional structure visualization. This shortcoming has been addressed in the proposed tool BLAST Server for Structural Biologists (BSSB), which maps a BLAST output onto the three-dimensional structure of the subject protein. The three-dimensional structure of the subject protein is represented using a three-color coding scheme (identical: red; similar: yellow; and mismatch: white) based on the pairwise alignment obtained. Thus, the user will be able to visualize a possible three-dimensional structure for the query protein sequence. This information can be used to gain a deeper insight into the sequence-structure correlation. Furthermore, the additional structure-level information enables the user to make coherent and logical decisions regarding the type of input model structure or fragment that can be used for molecular replacement calculations. This tool is freely available to all users at http://bioserver1.physics.iisc.ernet.in/bssb/.
Resumo:
Rv2118c belongs to the class of conserved hypothetical proteins from Mycobacterium tuberculosis H37Rv. The crystal structure of Rv2118c in complex with S-adenosyl-Image -methionine (AdoMet) has been determined at 1.98 Å resolution. The crystallographic asymmetric unit consists of a monomer, but symmetry-related subunits interact extensively, leading to a tetrameric structure. The structure of the monomer can be divided functionally into two domains: the larger catalytic C-terminal domain that binds the cofactor AdoMet and is involved in the transfer of methyl group from AdoMet to the substrate and a smaller N-terminal domain. The structure of the catalytic domain is very similar to that of other AdoMet-dependent methyltransferases. The N-terminal domain is primarily a β-structure with a fold not found in other methyltransferases of known structure. Database searches reveal a conserved family of Rv2118c-like proteins from various organisms. Multiple sequence alignments show several regions of high sequence similarity (motifs) in this family of proteins. Structure analysis and homology to yeast Gcd14p suggest that Rv2118c could be an RNA methyltransferase, but further studies are required to establish its functional role conclusively.
Resumo:
Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of `protein-like' sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bona fide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a `roulette-wheel' selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5-10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.
Resumo:
Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like ``linker'' sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be ``plugged-into'' routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
Guanylyl cyclases (GCs) are enzymes that generate cyclic GMP and regulate different physiologic and developmental processes in a number of organisms. GCs possess sequence similarity to class III adenylyl cyclases (ACs) and are present as either membrane-bound receptor GCs or cytosolic soluble GCs. We sought to determine the evolution of GCs using a large-scale bioinformatic analysis and found multiple lineage-specific expansions of GC genes in the genomes of many eukaryotes. Moreover, a few GC-like proteins were identified in prokaryotes, which come fused to a number of different domains, suggesting allosteric regulation of nucleotide cyclase activity Eukaryotic receptor GCs are associated with a kinase homology domain (KHD), and phylogenetic analysis of these proteins suggest coevolution of the KHD and the associated cyclase domain as well as a conservation of the sequence and the size of the linker region between the KHD and the associated cyclase domain. Finally, we also report the existence of mimiviral proteins that contain putative active kinase domains associated with a cyclase domain, which could suggest early evolution of the fusion of these two important domains involved in signa transduction.
Resumo:
Human CGI-58 (for comparative gene identification-58) and YLR099c, encoding Ict1p in Saccharomyces cerevisiae, have recently been identified as acyl-CoA-dependent lysophosphatidic acid acyltransferases. Sequence database searches for CGI-58 like proteins in Arabidopsis (Arabidopsis thaliana) revealed 24 proteins with At4g24160, a member of the alpha/beta-hydrolase family of proteins being the closest homolog. At4g24160 contains three motifs that are conserved across the plant species: a GXSXG lipase motif, a HX4D acyltransferase motif, and V(X)(3)HGF, a probable lipid binding motif. Dendrogram analysis of yeast ICT1, CGI-58, and At4g24160 placed these three polypeptides in the same group. Here, we describe and characterize At4g24160 as, to our knowledge, the first soluble lysophosphatidic acid acyltransferase in plants. A lipidomics approach revealed that At4g24160 has additional triacylglycerol lipase and phosphatidylcholine hydrolyzing enzymatic activities. These data establish At4g24160, a protein with a previously unknown function, as an enzyme that might play a pivotal role in maintaining the lipid homeostasis in plants by regulating both phospholipid and neutral lipid levels.
Resumo:
Jacalin [Artocarpus integrifolia (jack fruit) agglutinin] is made up of two types of chains, heavy and light, with M(r) values of 16,200 +/- 1200 and 2090 +/- 300 respectively (on the basis of gel-permeation chromatography under denaturing conditions). Its complete amino acid sequence was determined by manual degradation using a 4-dimethylaminoazobenzene 4'-isothiocyanate double-coupling method. Peptide fragments for sequence analysis were obtained by chemical cleavages of the heavy chain with CNBr, hydroxylamine hydrochloride and iodosobenzoic acid and enzymic cleavage with Staphylococcus aureus proteinase. The peptides were purified by a combination gel-permeation and reverse-phase chromatography. The light chains, being only 20 residues long, could be sequenced without fragmentation. Amino acid analyses and carboxypeptidase-Y-digestion C-terminal analyses of the subunits provided supportive evidence for their sequence. Computer-assisted alignment of the jacalin heavy-chain sequence failed to show sequence similarity to that of any lectin for which the complete sequence is known. Analyses of the sequence showed the presence of an internal repeat spanning residues 7-64 and 76-130. The internal repeat was found to be statistically significant.
Resumo:
The genome of the human pathogen Entamoeba histolytica, a primitive protist, contains non-long terminal repeat retrotransposable elements called EhLINEs. These encode reverse transcriptase and endonuclease required for retrotransposition. The endonuclease shows sequence similarity with bacterial restriction endonucleases. Here we report the salient enzymatic features of one such endonuclease. The kinetics of an EhLINE1-encoded endonuclease catalyzed reaction, determined under steady-state and single-turnover conditions, revealed a significant burst phase followed by a slower steady-state phase, indicating that release of product could be the slower step in this reaction. For circular supercoiled DNA the K-m was 2.6 x 10-8 m and the k(cat) was 1.6 x 10-2 sec-1. For linear E. histolytica DNA substrate the K-m and k(cat) values were 1.3 x 10-8 m and 2.2 x 10-4 sec-1 respectively. Single-turnover reaction kinetics suggested a noncooperative mode of hydrolysis. The enzyme behaved as a monomer. While Mg2+ was required for activity, 60% activity was seen with Mn2+ and none with other divalent metal ions. Substitution of PDX12-14D (a metal-binding motif) with PAX(12-14)D caused local conformational change in the protein tertiary structure, which could contribute to reduced enzyme activity in the mutated protein. The protein underwent conformational change upon the addition of DNA, which is consistent with the known behavior of restriction endonucleases. The similarities with bacterial restriction endonucleases suggest that the EhLINE1-encoded endonuclease was possibly acquired from bacteria through horizontal gene transfer. The loss of strict sequence specificity for nicking may have been subsequently selected to facilitate spread of the retrotransposon to intergenic regions of the E. histolytica genome.
Resumo:
We have recently implicated heat shock protein 90 from Plasmodium falciparum (PfHsp90) as a potential drug target against malaria. Using inhibitors specific to the nucleotide binding domain of Hsp90, we have shown potent growth inhibitory effects on development of malarial parasite in human erythrocytes. To gain better understanding of the vital role played by PfHsp90 in parasite growth, we have modeled its three dimensional structure using recently described full length structure of yeast Hsp90. Sequence similarity found between PfHsp90 and yeast Hsp90 allowed us to model the core structure with high confidence. The superimposition of the predicted structure with that of the template yeast Hsp90 structure reveals an RMSD of 3.31 angstrom. The N-terminal and middle domains showed the least RMSD (1.76 angstrom) while the more divergent C-terminus showed a greater RMSD (2.84 angstrom) with respect to the template. The structure shows overall conservation of domains involved in nucleotide binding, ATPase activity, co-chaperone binding as well as inter-subunit interactions. Important co-chaperones known to modulate Hsp90 function in other eukaryotes are conserved in malarial parasite as well. An acidic stretch of amino acids found in the linker region, which is uniquely extended in PfHsp90 could not be modeled in this structure suggesting a flexible conformation. Our results provide a basis to compare the overall structure and functional pathways dependent on PfHsp90 in malarial parasite. Further analysis of differences found between human and parasite Hsp90 may make it possible to design inhibitors targeted specifically against malaria.
Resumo:
Protein kinases phosphorylating Ser/Thr/Tyr residues in several cellular proteins exert tight control over their biological functions. They constitute the largest protein family in most eukaryotic species. Protein kinases classified based on sequence similarity in their catalytic domains, cluster into subfamilies, which share gross functional properties. Many protein kinases are associated or tethered covalently to domains that serve as adapter or regulatory modules,naiding substrate recruitment, specificity, and also serve as scaffolds. Hence the modular organisation of the protein kinases serves as guidelines to their functional and molecular properties. Analysis of genomic repertoires of protein kinases in eukaryotes have revealed wide spectrum of domain organisation across various subfamilies of kinases. Occurrence of organism-specific novel domain combinations suggests functional diversity achieved by protein kinases in order to regulate variety of biological processes. In addition, domain architecture of protein kinases revealed existence of hybrid protein kinase subfamilies and their emerging roles in the signaling of eukaryotic organisms. In this review we discuss the repertoire of non-kinase domains tethered to multi-domain kinases in the metazoans. Similarities and differences in the domain architectures of protein kinases in these organisms indicate conserved and unique features that are critical to functional specialization. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
The TCP transcription factors control multiple developmental traits in diverse plant species. Members of this family share an similar to 60-residue-long TCP domain that binds to DNA. The TCP domain is predicted to form a basic helix-loop-helix ( bHLH) structure but shares little sequence similarity with canonical bHLH domain. This classifies the TCP domain as a novel class of DNA binding domain specific to the plant kingdom. Little is known about how the TCP domain interacts with its target DNA. We report biochemical characterization and DNA binding properties of a TCP member in Arabidopsis thaliana, TCP4. We have shown that the 58-residue domain of TCP4 is essential and sufficient for binding to DNA and possesses DNA binding parameters comparable to canonical bHLH proteins. Using a yeast-based random mutagenesis screen and site-directed mutants, we identified the residues important for DNA binding and dimer formation. Mutants defective in binding and dimerization failed to rescue the phenotype of an Arabidopsis line lacking the endogenous TCP4 activity. By combining structure prediction, functional characterization of the mutants, and molecular modeling, we suggest a possible DNA binding mechanism for this class of transcription factors.
Resumo:
The Mycobacterium tuberculosis transcriptional regulator Rv1364c regulates the activity of the stress response sigma factor sigma(F). This multi-domain protein has several components: a signaling PAS domain and an effector segment comprising of a phosphatase, a kinase and an anti-anti-sigma factor domain. Based on Small Angle X-ray Scattering (SAXS) data, Rv1364c was recently shown to be a homo-dimer and adopt an elongated conformation in solution. The PAS domain could not be modeled into the structural envelope due to poor sequence similarity with known PAS proteins. The crystal structure of the PAS domain described here provides a structural basis for the dimerization of Rv1364c. It thus appears likely that the PAS domain regulates the anti-sigma activity of Rv1364c by oligomerization. A structural comparison with other characterized PAS domains reveal several sequence and conformational features that could facilitate ligand binding - a feature which suggests that the function of Rv1364c could potentially be governed by specific cellular signals or metabolic cues. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
The Rv1625c Class III adenylyl cyclase from Mycobacterium tuberculosis is a homodimeric enzyme with two catalytic centers at the dimer interface, and shows sequence similarity with the mammalian adenylyl and guanylyl cyclases. Mutation of the substrate-specifying residues in the catalytic domain of Rv1625c, either independently or together, to those present in guanylyl cyclases not only failed to confer guanylyl cyclase activity to the protein, but also severely abrogated the adenylyl cyclase activity of the enzyme. Biochemical analysis revealed alterations in the behavior of the mutants on ion-exchange chromatography, indicating differences in the surface-exposed charge upon mutation of substrate-specifying residues. The mutant proteins showed alterations in oligomeric status as compared to the wild-type enzyme, and differing abilities to heterodimerize with the wild-type protein. The crystal structure of a mutant has been solved to a resolution of 2.7 angstrom. On the basis of the structure, and additional biochemical studies, we provide possible reasons for the altered properties of the mutant proteins, as well as highlight unique structural features of the Rv1625c adenylyl cyclase. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
Sesbania mosaic virus (SeMV) is a single-stranded positive-sense RNA plant virus belonging to the genus Sobemovirus. The movement protein (MP) encoded by SeMV ORF1 showed no significant sequence similarity with MPs of other genera, but showed 32% identity with the MP of Southern bean mosaic virus within the Sobemovirus genus. With a view to understanding the mechanism of cell-to-cell movement in sobemoviruses, the SeMV MP gene was cloned, over-expressed in Escherichia coli and purified. Interaction of the recombinant MP with the native virus (NV) was investigated by ELISA and pull-down assays. It was observed that SeMV MP interacted with NV in a concentration- and pH-dependent manner. Analysis of N- and C-terminal deletion mutants of the MP showed that SeMV MP interacts with the NV through the N- terminal 49 amino acid segment. Yeast two-hybrid assays confirmed the in vitro observations, and suggested that SeMV might belong to the class of viruses that require MP and NV/coat protein for cell-to-cell movement.