135 resultados para Protein Sequence
Resumo:
In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10(-7). In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function.
Resumo:
The non-oxidative decarboxylation of aromatic acids is a poorly understood reaction. The transformation of 2,3-dihydroxybenzoic acid to catechol in the fungal metabolism of indole is a prototype of such a reaction. 2,3-Dihydroxybenzoic acid decarboxylase (EC 4.1.1.46) which catalyzes this reaction was purified to homogeneity from anthranilate induced cultures of Aspergillus oryzae using affinity chromatography. The enzyme did not require cofactors like NAD(+), PLP, TPP or metal ions for its activity. There was no spectral evidence for the presence of enzyme bound cofactors. The preparation, which was adjudged homogeneous by the criteria of SDS-PAGE, sedimentation analysis and N-terminal analysis, was characterized for its physicochemical and kinetic parameters. The enzyme was inactivated by group-specific modifiers like diethyl pyrocarbonate (DEPC) and N-ethylmaleimide (NEM). The kinetics of inactivation by DEPC suggested the presence of a single class of essential histidine residues, the second order rate constant of inactivation for which was 12.5 M(-1) min(-1). A single class of cysteine residues was modified by NEM with a second order rate constant of 33 M(-1) min(-1). Substrate analogues protected the enzyme against inactivation by both DEPC and NEM, suggesting the Location of the essential histidine and cysteine to be at the active site of the enzyme. The incorporation of radiolabelled NEM in a differential labelling experiment was 0.73 mol per mol subunit confirming the presence of a single essential cysteine per active-site. Differentially labelled enzyme was enzymatically cleaved and the peptide bearing the label was purified and sequenced. The active-site peptide LLGLAETCK and the N-terminal sequence MLGKIALEEAFALPRFEEKT did not bear any similarity to sequences reported in the Swiss-Prot Protein Sequence Databank, a reflection probably of the unique primary structure of this novel enzyme. The sequences reported in this study will appear in the Swiss-Prot Protein Sequence Databank under the accession number P80402.
Resumo:
The Basic Local Alignment Search Tool (BLAST) is one of the most widely used sequence alignment programs with which similarity searches, for both protein and nucleic acid sequences, can be performed against large databases at high speed. A large number of tools exist for processing BLAST output, but none of them provide three-dimensional structure visualization. This shortcoming has been addressed in the proposed tool BLAST Server for Structural Biologists (BSSB), which maps a BLAST output onto the three-dimensional structure of the subject protein. The three-dimensional structure of the subject protein is represented using a three-color coding scheme (identical: red; similar: yellow; and mismatch: white) based on the pairwise alignment obtained. Thus, the user will be able to visualize a possible three-dimensional structure for the query protein sequence. This information can be used to gain a deeper insight into the sequence-structure correlation. Furthermore, the additional structure-level information enables the user to make coherent and logical decisions regarding the type of input model structure or fragment that can be used for molecular replacement calculations. This tool is freely available to all users at http://bioserver1.physics.iisc.ernet.in/bssb/.
Resumo:
A palindrome is a set of characters that reads the same forwards and backwards. Since the discovery of palindromic peptide sequences two decades ago, little effort has been made to understand its structural, functional and evolutionary significance. Therefore, in view of this, an algorithm has been developed to identify all perfect palindromes (excluding the palindromic subset and tandem repeats) in a single protein sequence. The proposed algorithm does not impose any restriction on the number of residues to be given in the input sequence. This avant-garde algorithm will aid in the identification of palindromic peptide sequences of varying lengths in a single protein sequence.
Resumo:
Genomic sequences are far from being random but are made up of systematically ordered and information rich patterns. These repeated sequence patterns have been vastly utilized for their fundamental importance in understanding the genome function and organization. To this end, a comprehensive toolkit, RepEx, has been developed which extracts repeat (inverted, everted and mirror) patterns from the given genome sequence(s) without any constraints. The toolkit can also be used to fetch the inverted repeats present in the protein sequence (s). Further, it is capable of extracting exact and degenerate repeats with a user defined spacer intervals. It is remarkably more precise and sensitive when compared to the existing tools. An example with comprehensive case studies and a performance evaluation of the proposed toolkit has been presented to authenticate its efficiency and accuracy. (C) 2013 Elsevier Inc. All rights reserved.
Resumo:
Establishing functional relationships between multi-domain protein sequences is a non-trivial task. Traditionally, delineating functional assignment and relationships of proteins requires domain assignments as a prerequisite. This process is sensitive to alignment quality and domain definitions. In multi-domain proteins due to multiple reasons, the quality of alignments is poor. We report the correspondence between the classification of proteins represented as full-length gene products and their functions. Our approach differs fundamentally from traditional methods in not performing the classification at the level of domains. Our method is based on an alignment free local matching scores (LMS) computation at the amino-acid sequence level followed by hierarchical clustering. As there are no gold standards for full-length protein sequence classification, we resorted to Gene Ontology and domain-architecture based similarity measures to assess our classification. The final clusters obtained using LMS show high functional and domain architectural similarities. Comparison of the current method with alignment based approaches at both domain and full-length protein showed superiority of the LMS scores. Using this method we have recreated objective relationships among different protein kinase sub-families and also classified immunoglobulin containing proteins where sub-family definitions do not exist currently. This method can be applied to any set of protein sequences and hence will be instrumental in analysis of large numbers of full-length protein sequences.
Resumo:
Background: The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better. Results: Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions. Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family. Conclusions: CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.
Resumo:
The genomic sequences of several RNA plant viruses including cucumber mosaic virus, brome mosaic virus, alfalfa mosaic virus and tobacco mosaic virus have become available recently. The former two viruses are icosahedral while the latter two are bullet and rod shaped, respectively in particle morphology. The non-structural 3a proteins of cucumber mosaic virus and brome mosaic virus have an amino acid sequence homology of 35% and hence are evolutionarily related. In contrast, the coat proteins exhibit little homology, although the circular dichroism spectrum of these viruses are similar. The non-coding regions of the genome also exhibit variable but extensive homology. Comparison of the brome mosaic virus and alfalfa mosaic virus sequences reveals that they are probably related although with a much larger evolutionary distance. The polypeptide folds of the coat protein of three biologically distinct isometric plant viruses, tomato bushy stunt virus, southern bean mosaic virus and satellite tobacco necrosis virus have been shown to display a striking resemblance. All of them consist of a topologically similar 8-standard β-barrel. The implications of these studies to the understanding of the evolution of plant viruses will be discussed.
Resumo:
The structure and properties of the double-helical form of the alternating copolymer poly(dA-dT) are considered. Different lines of evidence are interpreted in terms of a structure in which every second phosphate-diester linkage has a conformation different from that of the normal B form. A rationale for this “alternating-B” structure is given which provides an explanation for the effects of chemical modifications of the T residues on the binding of the poly(dA-dT)· poly(dA-dT) to the lac repressor of Escherichia coli.
Resumo:
Sesbania mosaic virus (SMV) is a plant virus infecting Sesbania grandiflora plants in Andhra Pradesh, India. Amino acid sequence of the tryptic peptides of SMV coat protein were determined using a gas phase sequenator. These sequences showed identical amino acids at 69% of the positions when aligned with the corresponding residues of southern bean mosaic virus (SBMV).Crystals diffracting to better than 3 Å resolution were obtained by precipitating the virus with ammonium sulphate. The crystals belonged to rhombohedral space group R3 with α = 291·4 Å and α = 61·9°. Three-dimensional X-ray diffraction data on these crystals were collected to a resolution of 4·7 Å, using a Siemens-Nicolet area detector system. Self-rotation function studies revealed the icosahedral symmetry of the virus particles, as well as their precise orientation in the unit cell. Cross-rotation function and modelling studies with SBMV showed that it is a valid starting model for SMV structure determination. Low resolution phases computed using a polyalanine model of SBMV were subjected to refinement and extension by real-space electron density averaging and solvent flattening. The final electron density map revealed a polypeptide fold similar to SBMV. The single disulphide bridge of SBMV coat protein is retained in SMV. Four icosahedrally independent cation binding sites have been tentatively identified. Three of these sites, related by a quasi threefold axis, are also found in SBMV. The fourth site is situated on the quasi threefold axis. Aspartic acid residues, which replace Ile218 of SBMV from the quasi threefold-related subunits are suitable ligands to the cation at this site
Resumo:
A complete cDNA encoding a novel hybrid Pro-rich protein (HyPRP) was identified by differentially screening 3x10(4) recombinant plaques of a Cuscuta reflexa cytokinin-induced haustorial cDNA library constructed in lambda gt10. The nucleotide (nt) sequence consists of: (i) a 424-bp 5'-non coding region having five start codons (ATGs) and three upstream open reading frames (uORFs); (ii) an ORF of 987 bp with coding potential for a 329-amino-acid (aa) protein of M(r), 35203 with a hydrophobic N-terminal region including a stretch of nine consecutive Phe followed by a Pro-rich sequence and a Cys-rich hydrophobic C terminus; and (iii) a 178-bp 3'-UTR (untranslated region). Comparison of the predicted aa sequence with the NBRF and SWISSPROT databases and with a recent report of an embryo-specific protein of maize [Jose-Estanyol et al., Plant Cell 4 (1992) 413-423] showed it to be similar to the class of HyPRPs encoded by genes preferentially expressed in young tomato fruits, maize embryos and in vitro-cultured carrot embryos. Northern analysis revealed an approx. 1.8-kb mRNA of this gene expressed in the subapical region of the C. reflexa vine which exhibited maximum sensitivity to cytokinin in haustorial induction.
Resumo:
The structural proteins of mycobacteriophage I3 have been analysed by sodium dodecyl sulfate-polyacrylamide-gel electrophoresis (SDS-PAGE), radioiodination and immunoblotting. Based on their abundance the 34- and 70-kDa bands appeared to represent the major structural proteins. Successful cloning and expression of the 70-kDa protein-encoding gene of phage I3 in Escherichia coli and its complete nucleotide sequence determination have been accomplished, A second (partial) open reading frame following the stop codon for the 70-kDa protein was also identified within the cloned fragment. The deduced amino-acid sequence of the 70-kDa protein and the codon usage patterns indicated the preponderance of codons, as predicted from the high G+C content of the genomic DNA of phage I3.
Resumo:
NSP3, an acidic nonstructural protein, encoded by gene 7 has been implicated as the key player in the assembly of the 11 viral plus-strand RNAs into the early replication intermediates during rotavirus morphogenesis. To date, the sequence or NSP3 from only three animal rotaviruses (SA11, SA114F, and bovine UK) has been determined and that from a human strain has not been reported. To determine the genetic diversity among gene 7 alleles from group A rotaviruses, the nucleotide sequence of the NSP3 gene from 13 strains belonging to nine different G serotypes, from both humans and animals, has been determined. Based on the amino acid sequence identity as well as phylogenetic analysis, NSP3 from group A rotaviruses falls into three evolutionarily related groups, i.e., the SA11 group, the Wa group, and the S2 group. The SA 11/SA114F gene appears to have a distant ancestral origin from that of the others and codes for a polypeptide of 315 amino acids (aa) in length. NSP3 from all other group A rotaviruses is only 313 aa in length because of a 2-amino-acid deletion near the carboxy-terminus, While the SA114F gene has the longest 3' untranslated region (UTR) of 132 nucleotides, that from other strains suffered deletions of varying lengths at two positions downstream of the translational termination codon. In spite of the divergence of the nucleotide (nt) sequence in the protein coding region, a stretch of about 80 nt in the 3' UTR is highly conserved in the NSP3 gene from all the strains. This conserved sequence in the 3' UTR might play an important role in the regulation of expression of the NSP3 gene. (C) 1995 Academic Press, Inc.
Resumo:
Transition protein-2 (TP2), isolated from rat testes, was recently shown to be a zinc metalloprotein. We have now carried out a detailed analysis of the DNA condensing properties of TP2 with various polynucleotides using circular dichroism spectroscopy. The condensation of the alternating copolymers by TP2 (incubated with 10 mu M ZnSO4), namely, poly(dG-dC). poly(dG-dC) and poly(dA-dT). poly(dA-dT), was severalfold higher than condensation of either of the homoduplexes poly(dG). poly-(dC) and poly(dA). poly(dT) or rat oligonucleosomal DNA. Between the two alternating copolymers, poly(dG-dC). poly(dG-dC) was condensed 3.2-fold more effectively than poly(dA-dT). poly(dA-dT). Preincubation of TP2 with 5 mM EDTA significantly reduced its DNA-condensing property. Interestingly, condensation of the alternating copolymer poly(dI-dC). poly(dI-dC) by TP2 was much less as compared to that of poly(dG-dC). poly(dG-dC). The V8 protease-derived N-terminal fragment (88 aa) condensed poly(dA-dT). poly(dA-dT) to a very small extent but did not have any effect on poly(dG-dC). poly-(dG-dC). The C-terminal fragment (28 aa) was able to condense poly(dA-dT) . poly(dA-dT) more effectively than poly(dG-dC). poly(dG-dC). These results suggest that TP2 in its zinc-coordinated form condenses GC-rich polynucleotides much more effectively than other types of polynucleotides. Neither the N-terminal two-thirds of TP2 which is the zinc-binding domain nor the C-terminal basic domain are as effective as intact TP2 in bringing about condensation of DNA.