13 resultados para Databases, Protein
em National Center for Biotechnology Information - NCBI
Resumo:
High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits.isb-sib.ch).
Resumo:
The Drosophila retinal degeneration C (rdgC) gene encodes an unusual protein serine/threonine phosphatase in that it contains at least two EF-hand motifs at its carboxy terminus. By a combination of large-scale sequencing of human retina cDNA clones and searches of expressed sequence tag and genomic DNA databases, we have identified two sequences in mammals [Protein Phosphatase with EF-hands-1 and 2 (PPEF-1 and PPEF-2)] and one in Caenorhabditis elegans (PPEF) that closely resemble rdgC. In the adult, PPEF-2 is expressed specifically in retinal rod photoreceptors and the pineal. In the retina, several isoforms of PPEF-2 are predicted to arise from differential splicing. The isoform that most closely resembles rdgC is localized to rod inner segments. Together with the recently described localization of PPEF-1 transcripts to primary somatosensory neurons and inner ear cells in the developing mouse, these data suggest that the PPEF family of protein serine/threonine phosphatases plays a specific and conserved role in diverse sensory neurons.
Resumo:
Multiprotein bridging factor 1 (MBF1) is a transcriptional cofactor that bridges between the TATA box-binding protein (TBP) and the Drosophila melanogaster nuclear hormone receptor FTZ-F1 or its silkworm counterpart BmFTZ-F1. A cDNA clone encoding MBF1 was isolated from the silkworm Bombyx mori whose sequence predicts a basic protein consisting of 146 amino acids. Bacterially expressed recombinant MBF1 is functional in interactions with TBP and a positive cofactor MBF2. The recombinant MBF1 also makes a direct contact with FTZ-F1 through the C-terminal region of the FTZ-F1 DNA-binding domain and stimulates the FTZ-F1 binding to its recognition site. The central region of MBF1 (residues 35–113) is essential for the binding of FTZ-F1, MBF2, and TBP. When the recombinant MBF1 was added to a HeLa cell nuclear extract in the presence of MBF2 and FTZ622 bearing the FTZ-F1 DNA-binding domain, it supported selective transcriptional activation of the fushi tarazu gene as natural MBF1 did. Mutations disrupting the binding of FTZ622 to DNA or MBF1, or a MBF2 mutation disrupting the binding to MBF1, all abolished the selective activation of transcription. These results suggest that tethering of the positive cofactor MBF2 to a FTZ-F1-binding site through FTZ-F1 and MBF1 is essential for the binding site-dependent activation of transcription. A homology search in the databases revealed that the deduced amino acid sequence of MBF1 is conserved across species from yeast to human.
Resumo:
Monoclonal antibodies raised against axonemal proteins of sea urchin spermatozoa have been used to study regulatory mechanisms involved in flagellar motility. Here, we report that one of these antibodies, monoclonal antibody D-316, has an unusual perturbating effect on the motility of sea urchin sperm models; it does not affect the beat frequency, the amplitude of beating or the percentage of motile sperm models, but instead promotes a marked transformation of the flagellar beating pattern which changes from a two-dimensional to a three-dimensional type of movement. On immunoblots of axonemal proteins separated by SDS-PAGE, D-316 recognized a single polypeptide of 90 kDa. This protein was purified following its extraction by exposure of axonemes to a brief heat treatment at 40°C. The protein copurified and coimmunoprecipitated with proteins of 43 and 34 kDa, suggesting that it exists as a complex in its native form. Using D-316 as a probe, a full-length cDNA clone encoding the 90-kDa protein was obtained from a sea urchin cDNA library. The sequence predicts a highly acidic (pI = 4.0) protein of 552 amino acids with a mass of 62,720 Da (p63). Comparison with protein sequences in databases indicated that the protein is related to radial spoke proteins 4 and 6 (RSP4 and RSP6) of Chlamydomonas reinhardtii, which share 37% and 25% similarity, respectively, with p63. However, the sea urchin protein possesses structural features distinct from RSP4 and RSP6, such as the presence of three major acidic stretches which contains 25, 17, and 12 aspartate and glutamate residues of 34-, 22-, and 14-amino acid long stretches, respectively, that are predicted to form α-helical coiled-coil secondary structures. These results suggest a major role for p63 in the maintenance of a planar form of sperm flagellar beating and provide new tools to study the function of radial spoke heads in more evolved species.
Resumo:
G-substrate, an endogenous substrate for cGMP-dependent protein kinase, exists almost exclusively in cerebellar Purkinje cells, where it is possibly involved in the induction of long-term depression. A G-substrate cDNA was identified by screening expressed sequence tag databases from a human brain library. The deduced amino acid sequence of human G-substrate contained two putative phosphorylation sites (Thr-68 and Thr-119) with amino acid sequences [KPRRKDT(p)PALH] that were identical to those reported for rabbit G-substrate. G-substrate mRNA was expressed almost exclusively in the cerebellum as a single transcript. The human G-substrate gene was mapped to human chromosome 7p15 by radiation hybrid panel analysis. In vitro translation products of the cDNA showed an apparent molecular mass of 24 kDa on SDS/PAGE which was close to that of purified rabbit G-substrate (23 kDa). Bacterially expressed human G-substrate is a heat-stable and acid-soluble protein that cross-reacts with antibodies raised against rabbit G-substrate. Recombinant human G-substrate was phosphorylated efficiently by cGMP-dependent protein kinase exclusively at Thr residues, and it was recognized by antibodies specific for rabbit phospho-G-substrate. The amino acid sequences surrounding the sites of phosphorylation in G-substrate are related to those around Thr-34 and Thr-35 of the dopamine- and cAMP-regulated phosphoprotein DARPP-32 and inhibitor-1, respectively, two potent inhibitors of protein phosphatase 1. However, purified G-substrate phosphorylated by cGMP-dependent protein kinase inhibited protein phosphatase 2A more effectively than protein phosphatase 1, suggesting a distinct role as a protein phosphatase inhibitor.
Resumo:
MetaFam is a comprehensive relational database of protein family information. This web-accessible resource integrates data from several primary sequence and secondary protein family databases. By pooling together the information from these disparate sources, MetaFam is able to provide the most complete protein family sets available. Users are able to explore the interrelationships among these primary and secondary databases using a powerful graphical visualization tool, MetaFamView. Additionally, users can identify corresponding sequence entries among the sequence databases, obtain a quick summary of corresponding families (and their sequence members) among the family databases, and even attempt to classify their own unassigned sequences. Hypertext links to the appropriate source databases are provided at every level of navigation. Global family database statistics and information are also provided. Public access to the data is available at http://metafam.ahc.umn.edu/.
Resumo:
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.
Resumo:
The RESID Database is a comprehensive collection of annotations and structures for protein post-translational modifications including N-terminal, C-terminal and peptide chain cross-link modifications. The RESID Database includes systematic and frequently observed alternate names, Chemical Abstracts Service registry numbers, atomic formulas and weights, enzyme activities, taxonomic range, keywords, literature citations with database cross-references, structural diagrams and molecular models. The NRL-3D Sequence–Structure Database is derived from the three-dimensional structure of proteins deposited with the Research Collaboratory for Structural Bioinformatics Protein Data Bank. The NRL-3D Database includes standardized and frequently observed alternate names, sources, keywords, literature citations, experimental conditions and searchable sequences from model coordinates. These databases are freely accessible through the National Cancer Institute–Frederick Advanced Biomedical Computing Center at these web sites: http://www.ncifcrf.gov/RESID, http://www.ncifcrf.gov/ NRL-3D; or at these National Biomedical Research Foundation Protein Information Resource web sites: http://pir.georgetown.edu/pirwww/dbinfo/resid.html, http://pir.georgetown.edu/pirwww/dbinfo/nrl3d.html
Resumo:
The iProClass database is an integrated resource that provides comprehensive family relationships and structural and functional features of proteins, with rich links to various databases. It is extended from ProClass, a protein family database that integrates PIR superfamilies and PROSITE motifs. The iProClass currently consists of more than 200 000 non-redundant PIR and SWISS-PROT proteins organized with more than 28 000 superfamilies, 2600 domains, 1300 motifs, 280 post-translational modification sites and links to more than 30 databases of protein families, structures, functions, genes, genomes, literature and taxonomy. Protein and family summary reports provide rich annotations, including membership information with length, taxonomy and keyword statistics, full family relationships, comprehensive enzyme and PDB cross-references and graphical feature display. The database facilitates classification-driven annotation for protein sequence databases and complete genomes, and supports structural and functional genomic research. The iProClass is implemented in Oracle 8i object-relational system and available for sequence search and report retrieval at http://pir.georgetow n.edu/iproclass/.
Resumo:
There is no control over the information provided with sequences when they are deposited in the sequence databases. Consequently mistakes can seed the incorrect annotation of other sequences. Grouping genes into families and applying controlled annotation overcomes the problems of incorrect annotation associated with individual sequences. Two databases (http://www.mendel.ac.uk) were created to apply controlled annotation to plant genes and plant ESTs: Mendel-GFDb is a database of plant protein (gene) families based on gapped-BLAST analysis of all sequences in the SWISS-PROT family of databases. Sequences are aligned (ClustalW) and identical and similar residues shaded. The families are visually curated to ensure that one or more criteria, for example overall relatedness and/or domain similarity relate all sequences within a family. Sequence families are assigned a ‘Gene Family Number’ and a unified description is developed which best describes the family and its members. If authority exists the gene family is assigned a ‘Gene Family Name’. This information is placed in Mendel-GFDb. Mendel-ESTS is primarily a database of plant ESTs, which have been compared to Mendel-GFDb, completely sequenced genomes and domain databases. This approach associated ESTs with individual sequences and the controlled annotation of gene families and protein domains; the information being placed in Mendel-ESTS. The controlled annotation applied to genes and ESTs provides a basis from which a plant transcription database can be developed.
Resumo:
SBASE 8.0 is the eighth release of the SBASE library of protein domain sequences that contains 294 898 annotated structural, functional, ligand-binding and topogenic segments of proteins, cross-referenced to most major sequence databases and sequence pattern collections. The entries are clustered into over 2005 statistically validated domain groups (SBASE-A) and 595 non-validated groups (SBASE-B), provided with several WWW-based search and browsing facilities for online use. A domain-search facility was developed, based on non-parametric pattern recognition methods, including artificial neural networks. SBASE 8.0 is freely available by anonymous ‘ftp’ file transfer from ftp.icgeb.trieste.it. Automated searching of SBASE can be carried out with the WWW servers http://www.icgeb.trieste.it/sbase/ and http://sbase.abc.hu/sbase/.
Resumo:
Signature databases are vital tools for identifying distant relationships in novel sequences and hence for inferring protein function. InterPro is an integrated documentation resource for protein families, domains and functional sites, which amalgamates the efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. Each InterPro entry includes a functional description, annotation, literature references and links back to the relevant member database(s). Release 2.0 of InterPro (October 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification encoded by a total of 6804 different regular expressions, profiles, fingerprints and Hidden Markov Models. Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1 000 000 hits from 462 500 proteins in SWISS-PROT and TrEMBL). The database is accessible for text- and sequence-based searches at http://www.ebi.ac.uk/interpro/. Questions can be emailed to interhelp@ebi.ac.uk.
Resumo:
We present a method for discovering conserved sequence motifs from families of aligned protein sequences. The method has been implemented as a computer program called emotif (http://motif.stanford.edu/emotif). Given an aligned set of protein sequences, emotif generates a set of motifs with a wide range of specificities and sensitivities. emotif also can generate motifs that describe possible subfamilies of a protein superfamily. A disjunction of such motifs often can represent the entire superfamily with high specificity and sensitivity. We have used emotif to generate sets of motifs from all 7,000 protein alignments in the blocks and prints databases. The resulting database, called identify (http://motif.stanford.edu/identify), contains more than 50,000 motifs. For each alignment, the database contains several motifs having a probability of matching a false positive that range from 10−10 to 10−5. Highly specific motifs are well suited for searching entire proteomes, while generating very few false predictions. identify assigns biological functions to 25–30% of all proteins encoded by the Saccharomyces cerevisiae genome and by several bacterial genomes. In particular, identify assigned functions to 172 of proteins of unknown function in the yeast genome.