160 resultados para protein sequence classification
em National Center for Biotechnology Information - NCBI
Resumo:
SBASE 8.0 is the eighth release of the SBASE library of protein domain sequences that contains 294 898 annotated structural, functional, ligand-binding and topogenic segments of proteins, cross-referenced to most major sequence databases and sequence pattern collections. The entries are clustered into over 2005 statistically validated domain groups (SBASE-A) and 595 non-validated groups (SBASE-B), provided with several WWW-based search and browsing facilities for online use. A domain-search facility was developed, based on non-parametric pattern recognition methods, including artificial neural networks. SBASE 8.0 is freely available by anonymous ‘ftp’ file transfer from ftp.icgeb.trieste.it. Automated searching of SBASE can be carried out with the WWW servers http://www.icgeb.trieste.it/sbase/ and http://sbase.abc.hu/sbase/.
Resumo:
We present a method for discovering conserved sequence motifs from families of aligned protein sequences. The method has been implemented as a computer program called emotif (http://motif.stanford.edu/emotif). Given an aligned set of protein sequences, emotif generates a set of motifs with a wide range of specificities and sensitivities. emotif also can generate motifs that describe possible subfamilies of a protein superfamily. A disjunction of such motifs often can represent the entire superfamily with high specificity and sensitivity. We have used emotif to generate sets of motifs from all 7,000 protein alignments in the blocks and prints databases. The resulting database, called identify (http://motif.stanford.edu/identify), contains more than 50,000 motifs. For each alignment, the database contains several motifs having a probability of matching a false positive that range from 10−10 to 10−5. Highly specific motifs are well suited for searching entire proteomes, while generating very few false predictions. identify assigns biological functions to 25–30% of all proteins encoded by the Saccharomyces cerevisiae genome and by several bacterial genomes. In particular, identify assigned functions to 172 of proteins of unknown function in the yeast genome.
Resumo:
In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues. Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicity, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested. As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease H proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.
Resumo:
The iProClass database is an integrated resource that provides comprehensive family relationships and structural and functional features of proteins, with rich links to various databases. It is extended from ProClass, a protein family database that integrates PIR superfamilies and PROSITE motifs. The iProClass currently consists of more than 200 000 non-redundant PIR and SWISS-PROT proteins organized with more than 28 000 superfamilies, 2600 domains, 1300 motifs, 280 post-translational modification sites and links to more than 30 databases of protein families, structures, functions, genes, genomes, literature and taxonomy. Protein and family summary reports provide rich annotations, including membership information with length, taxonomy and keyword statistics, full family relationships, comprehensive enzyme and PDB cross-references and graphical feature display. The database facilitates classification-driven annotation for protein sequence databases and complete genomes, and supports structural and functional genomic research. The iProClass is implemented in Oracle 8i object-relational system and available for sequence search and report retrieval at http://pir.georgetow n.edu/iproclass/.
Resumo:
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.
Resumo:
We have identified and characterized CLARP, a caspase-like apoptosis-regulatory protein. Sequence analysis revealed that human CLARP contains two amino-terminal death effector domains fused to a carboxyl-terminal caspase-like domain. The structure and amino acid sequence of CLARP resemble those of caspase-8, caspase-10, and DCP2, a Drosophila melanogaster protein identified in this study. Unlike caspase-8, caspase-10, and DCP2, however, two important residues predicted to be involved in catalysis were lost in the caspase-like domain of CLARP. Analysis with fluorogenic substrates for caspase activity confirmed that CLARP is catalytically inactive. CLARP was found to interact with caspase-8 but not with FADD/MORT-1, an upstream death effector domain-containing protein of the Fas and tumor necrosis factor receptor 1 signaling pathway. Expression of CLARP induced apoptosis, which was blocked by the viral caspase inhibitor p35, dominant negative mutant caspase-8, and the synthetic caspase inhibitor benzyloxycarbonyl-Val-Ala-Asp-(OMe)-fluoromethylketone (zVAD-fmk). Moreover, CLARP augmented the killing ability of caspase-8 and FADD/MORT-1 in mammalian cells. The human clarp gene maps to 2q33. Thus, CLARP represents a regulator of the upstream caspase-8, which may play a role in apoptosis during tissue development and homeostasis.
Resumo:
Site-directed mutagenesis and combinatorial libraries are powerful tools for providing information about the relationship between protein sequence and structure. Here we report two extensions that expand the utility of combinatorial mutagenesis for the quantitative assessment of hypotheses about the determinants of protein structure. First, we show that resin-splitting technology, which allows the construction of arbitrarily complex libraries of degenerate oligonucleotides, can be used to construct more complex protein libraries for hypothesis testing than can be constructed from oligonucleotides limited to degenerate codons. Second, using eglin c as a model protein, we show that regression analysis of activity scores from library data can be used to assess the relative contributions to the specific activity of the amino acids that were varied in the library. The regression parameters derived from the analysis of a 455-member sample from a library wherein four solvent-exposed sites in an α-helix can contain any of nine different amino acids are highly correlated (P < 0.0001, R2 = 0.97) to the relative helix propensities for those amino acids, as estimated by a variety of biophysical and computational techniques.
Resumo:
The discovery of cyanobacterial phytochrome histidine kinases, together with the evidence that phytochromes from higher plants display protein kinase activity, bind ATP analogs, and possess C-terminal domains similar to bacterial histidine kinases, has fueled the controversial hypothesis that the eukaryotic phytochrome family of photoreceptors are light-regulated enzymes. Here we demonstrate that purified recombinant phytochromes from a higher plant and a green alga exhibit serine/threonine kinase activity similar to that of phytochrome isolated from dark grown seedlings. Phosphorylation of recombinant oat phytochrome is a light- and chromophore-regulated intramolecular process. Based on comparative protein sequence alignments and biochemical cross-talk experiments with the response regulator substrate of the cyanobacterial phytochrome Cph1, we propose that eukaryotic phytochromes are histidine kinase paralogs with serine/threonine specificity whose enzymatic activity diverged from that of a prokaryotic ancestor after duplication of the transmitter module.
Resumo:
High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits.isb-sib.ch).
New approach for inhibiting Rev function and HIV-1 production using the influenza virus NS1 protein.
Resumo:
The Rev protein of HIV-1, which facilitates the nuclear export of HIV-1 pre-mRNAs, has been a target for antiviral therapy. Here we describe a new strategy for inhibiting Rev function and HIV-1 replication. In contrast to previous approaches, we use a wild-type rather than a mutant Rev protein and covalently link this Rev sequence to the NS1 protein of influenza A virus, a protein that inhibits the nuclear export of mRNAs. The NS1 protein contains an RNA-binding domain mutation (RM), so that the only functional RNA-binding domain in the chimeric protein (NS1RM-Rev) is in the Rev protein sequence. In the presence of the NS1RM-Rev chimeric protein, HIV-1 pre-mRNAs were retained in, rather than exported from, the nucleus. In addition, this chimeric protein effectively inhibited Rev function in trans in transfection experiments and effectively inhibited the production of HIV-1 in tissue culture cells transfected with an infectious molecular clone of HIV-1 DNA. The inhibitory activities of the NS1RM-Rev chimera were at least equivalent to those of the Rev M10 mutant protein, which has been considered to be the prototype trans inhibitor of Rev function and is currently in phase I clinical trials for the treatment of AIDS patients. We discuss (i) the potential for increasing the inhibitory activity of NS1-Rev chimeras against HIV-1 and (ii) the need for additional studies to evaluate these chimeras for the treatment of AIDS.
Resumo:
Recently, a large family of transducer proteins in the Archaeon Halobacterium salinarium was identified. On the basis of the comparison of the predicted structural domains of these transducers, three distinct subfamilies of transducers were proposed. Here we report isolation, complete gene sequences, and analysis of the encoded primary structures of transducer gene htrII, a member of family B, and its blue light receptor gene (sopII) of sensory rhodopsin II (SRII). The start codon ATG of the 714-bp sopII gene is one nucleotide beyond the termination codon TGA of the 2298-bp htrII gene. The deduced protein sequence of HtrII predicts a eubacterial chemotaxis transducer type with two hydrophobic membrane-spanning segments connecting sizable domains in the periplasm and cytoplasm. HtrII has a common feature with HtrI, the sensory rhodopsin I transducer; like HtrI, HtrII possesses a hydrophilic loop structure just after the second transmembrane segment. The C-terminal 299 residues (765 amino acid residues total) of HtrII show strong homology to the signaling and methylation domain of eubacterial transducer Tsr. The hydropathy plot of the primary structure of SRII indicates seven membrane-spanning alpha-helical segments, a characteristic feature of retinylidene proteins ("rhodopsins") from a widespread family of photoactive pigments. SRII shows high identity with SRI (42%), bacteriorhodopsin (BR) (32%), and halorhodopsin (24%). The crucial positions for retinal binding sites in these proteins are nearly identical, with the exception of Met-118 (numbering according to the mature BR sequence), which is replaced by Val in SRII. In BR, residues Asp-85 and Asp-96 are crucial in proton pumping. In SRII, the position corresponding to Asp-85 in BR is conserved, but the corresponding position of Asp-96 is replaced by an aromatic Tyr. Coexpression of the htrII and sopII genes restores SRII phototaxis to a mutant (Pho81) that contains a deletion in the htrI/sopI and insertion in htrII/sopII regions. This paper describes the first example that both HtrI and HtrII exist in the same halobacterial cell, confirming that different sensory rhodopsins SRI and SRII in the same organism have their own distinct transducers.
Resumo:
Life falls into three fundamental domains--Archaea, Bacteria, and Eucarya (formerly archaebacteria, eubacteria, and eukaryotes,. respectively). Though Archaea lack nuclei and share many morphological features with Bacteria, molecular analyses, principally of the transcription and translation machineries, have suggested that Archaea are more related to Eucarya than to Bacteria. Currently, little is known about the archaeal cell division apparatus. In Bacteria, a crucial component of the cell division machinery is FtsZ, a GTPase that localizes to a ring at the site of septation. Interestingly, FtsZ is distantly related in sequence to eukaryotic tubulins, which also interact with GTP and are components of the eukaryotic cell cytoskeleton. By screening for the ability to bind radiolabeled nucleotides, we have identified a protein of the hyperthermophilic archaeon Pyrococcus woesei that interacts tightly and specifically with GTP. Furthermore, through screening an expression library of P. woesei genomic DNA, we have cloned the gene encoding this protein. Sequence comparisons reveal that the P. woesei GTP-binding protein is strikingly related in sequence to eubacterial FtsZ and is marginally more similar to eukaryotic tubulins than are bacterial FtsZ proteins. Phylogenetic analyses reinforce the notion that there is an evolutionary linkage between FtsZ and tubulins. These findings suggest that the archaeal cell division apparatus may be fundamentally similar to that of Bacteria and lead us to consider the evolutionary relationships between Archaea, Bacteria, and Eucarya.
Resumo:
The three-dimensional structure of protein kinase C interacting protein 1 (PKCI-1) has been solved to high resolution by x-ray crystallography using single isomorphous replacement with anomalous scattering. The gene encoding human PKCI-1 was cloned from a cDNA library by using a partial sequence obtained from interactions identified in the yeast two-hybrid system between PKCI-1 and the regulatory domain of protein kinase C-beta. The PKCI-1 protein was expressed in Pichia pastoris as a dimer of two 13.7-kDa polypeptides. PKCI-1 is a member of the HIT family of proteins, shown by sequence identity to be conserved in a broad range of organisms including mycoplasma, plants, and humans. Despite the ubiquity of this protein sequence in nature, no distinct function has been shown for the protein product in vitro or in vivo. The PKCI-1 protomer has an alpha+beta meander fold containing a five-stranded antiparallel sheet and two helices. Two protomers come together to form a 10-stranded antiparallel sheet with extensive contacts between a helix and carboxy terminal amino acids of a protomer with the corresponding amino acids in the other protomer. PKCI-1 has been shown to interact specifically with zinc. The three-dimensional structure has been solved in the presence and absence of zinc and in two crystal forms. The structure of human PKCI-1 provides a model of this family of proteins which suggests a stable fold conserved throughout nature.
Resumo:
A novel cDNA, IA-2beta, was isolated from a mouse neonatal brain library. The predicted protein sequence revealed an extracellular domain, a transmembrane region, and an intracellular domain. The intracellular domain is 376 amino acids long and 74% identical to the intracellular domain of IA-2, a major autoantigen in insulin-dependent diabetes mellitus (IDDM). A partial sequence of the extracellular domain of IA-2beta indicates that it differs substantially (only 26% identical) from that of IA-2. Both molecules are expressed in islets and brain tissue. Forty-six percent (23 of 50) of the IDDM sera but none of the sera from normal controls (0 of 50) immunoprecipitated the intracellular domain of IA-2beta. Competitive inhibition experiments showed that IDDM sera have autoantibodies that recognize both common and distinct determinants on IA-2 and IA-2beta. Many IDDM sera are known to immunoprecipitate 37-kDa and 40-kDa tryptic fragments from islet cells, but the identity of the precursor protein(s) has remained elusive. The current study shows that treatment of recombinant IA-2beta and IA-2 with trypsin yields a 37-kDa fragment and a 40-kDa fragment, respectively, and that these fragments can be immunoprecipitated with diabetic sera. Absorption of diabetic sera with unlabeled recombinant IA-2 or IA-2beta, prior to incubation with radiolabeled 37-kDa and 40-kDa tryptic fragments derived from insulinoma or glucagonoma cells, blocks the immunoprecipitation of both of these radiolabeled tryptic fragments. We conclude that IA-2beta and IA-2 are the precursors of the 37-kDa and 40-kDa islet cell autoantigens, respectively, and that both IA-2 and IA-2beta are major autoantigens in IDDM.
Resumo:
Fas, a member of the tumor necrosis factor receptor family, can induce apoptosis when activated by Fas ligand binding or anti-Fas antibody crosslinking. Genetic studies have shown that a defect in Fas-mediated apoptosis resulted in abnormal development and function of the immune system in mice. A point mutation in the cytoplasmic domain of Fas (a single base change from T to A at base 786), replacing isoleucine with asparagine, abolishes the signal transducing property of Fas. Mice homozygous for this mutant allele (lprcg/lprcg mice) develop lymphadenopathy and a lupus-like autoimmune disease. Little is known about the mechanism of signal transduction in Fas-mediated apoptosis. In this study, we used the two-hybrid screen in yeast to isolate a Fas-associated protein factor, FAF1, which specifically interacts with the cytoplasmic domain of wild-type Fas but not the lprcg-mutated Fas protein. This interaction occurs not only in yeast but also in mammalian cells. When transiently expressed in L cells, FAF1 potentiated Fas-induced apoptosis. A search of available DNA and protein sequence data banks did not reveal significant homology between FAF1 and known proteins. Therefore, FAF1 is an unusual protein that binds to the wild type but not the inactive point mutant of Fas. FAF1 potentiates Fas-induced cell killing and is a candidate signal transducing molecule in the regulation of apoptosis.