995 resultados para Protein databases
Resumo:
We previously introduced two new protein databases (trEST and trGEN) of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Here, we present the updates made on these two databases plus a new database (trome), which uses alignments of EST data to HTG or full genomes to generate virtual transcripts and coding sequences. This new database is of higher quality and since it contains the information in a much denser format it is of much smaller size. These new databases are in a Swiss-Prot-like format and are updated on a weekly basis (trEST and trGEN) or every 3 months (trome). They can be downloaded by anonymous ftp from ftp://ftp.isrec.isb-sib.ch/pub/databases.
Resumo:
For determining functionality dependencies between two proteins, both represented as 3D structures, it is an essential condition that they have one or more matching structural regions called patches. As 3D structures for proteins are large, complex and constantly evolving, it is computationally expensive and very time-consuming to identify possible locations and sizes of patches for a given protein against a large protein database. In this paper, we address a vector space based representation for protein structures, where a patch is formed by the vectors within the region. Based on our previews work, a compact representation of the patch named patch signature is applied here. A similarity measure of two patches is then derived based on their signatures. To achieve fast patch matching in large protein databases, a match-and-expand strategy is proposed. Given a query patch, a set of small k-sized matching patches, called candidate patches, is generated in match stage. The candidate patches are further filtered by enlarging k in expand stage. Our extensive experimental results demonstrate encouraging performances with respect to this biologically critical but previously computationally prohibitive problem.
Resumo:
We have initiated a gene discovery program in Schistosoma mansoni based on the technique of Expressed Sequence Tags (ESTs), i.e. partial sequences of cDNAs obtained from single passes in automatic DNA sequencers. ESTs can be used to identify genese onf the basis of their homology whith sequences from other species deposited in DNA or protein databases. Trasncripts with sequences without matches in teh databases may represent novel parasite-specific genes. This approach has shown to be very efficient and in less than two years a broad range of novel genes has already been ascertained, more than doubling the number of known S. mansoni genes.
Resumo:
Selenoproteins are a diverse group of proteinsusually misidentified and misannotated in sequencedatabases. The presence of an in-frame UGA (stop)codon in the coding sequence of selenoproteingenes precludes their identification and correctannotation. The in-frame UGA codons are recodedto cotranslationally incorporate selenocysteine,a rare selenium-containing amino acid. The developmentof ad hoc experimental and, more recently,computational approaches have allowed the efficientidentification and characterization of theselenoproteomes of a growing number of species.Today, dozens of selenoprotein families have beendescribed and more are being discovered in recentlysequenced species, but the correct genomic annotationis not available for the majority of thesegenes. SelenoDB is a long-term project that aims toprovide, through the collaborative effort of experimentaland computational researchers, automaticand manually curated annotations of selenoproteingenes, proteins and SECIS elements. Version 1.0 ofthe database includes an initial set of eukaryoticgenomic annotations, with special emphasis on thehuman selenoproteome, for immediate inspectionby selenium researchers or incorporation into moregeneral databases. SelenoDB is freely available athttp://www.selenodb.org.
Resumo:
Les cellules CD8? T cytolytiques (CTL) sont les principaux effecteurs du système immunitaire adaptatif contre les infections et les tumeurs. La récente identification d?antigènes tumoraux humains reconnus par des cellules T cytolytiques est la base pour le, développement des vaccins antigène spécifiques contre le cancer. Le nombre d?antigènes tumoraux reconnus par des CTL que puisse être utilisé comme cible pour la vaccination des patients atteints du cancer est encore limité. Une nouvelle technique, simple et rapide, vient d?être proposée pour l?identification d?antigènes reconnus par des CTL. Elle se base sur l?utilisation de librairies combinatoriales de peptides arrangées en un format de "scanning" ou balayage par position (PS-SCL). La première partie de cette étude a consisté à valider cette nouvelle technique par une analyse détaillée de la reconnaissance des PS-SCL par différents clones de CTL spécifiques pour des antigènes associés à la tumeur (TAA) connus ainsi que par des clones de spécificité inconnue. Les résultats de ces analyses révèlent que pour tous les clones, la plupart des acides aminés qui composent la séquence du peptide antigénique naturel ont été identifiés par l?utilisation des PS-SCL. Les résultats obtenus ont permis d?identifier des peptides analogues ayant une antigènicité augmentée par rapport au peptide naturel, ainsi que des peptides comportant de multiples modifications de séquence, mais présentant la même réactivité que le peptide naturel. La deuxième partie de cette étude a consisté à effectuer des analyses biométriques des résultats complexes générés par la PS-SCL. Cette approche a permis l?identification des séquences correspondant aux épitopes naturels à partir de bases de données de peptides publiques. Parmi des milliers de peptides, les séquences naturelles se trouvent comprises dans les 30 séquences ayant les scores potentiels de stimulation les plus élevés pour chaque TAA étudié. Mais plus important encore, l?utilisation des PS-SCL avec un clone réactif contre des cellules tumorales mais de spécificité inconnue nous a permis d?identifier I?epitope reconnu par ce clone. Les données présentées ici encouragent l?utilisation des PS-SCL pour l?identification et l?optimisation d?épitopes pour des CTL réactifs anti-tumoraux, ainsi que pour l?étude de la reconnaissance dégénérée d?antigènes par les CTL.<br/><br/>CD8+ cytolytic T lymphocytes (CTL) are the main effector cells of the adaptive immune system against infection and tumors. The recent identification of moleculariy defined human tumor Ags recognized by autologous CTL has opened new opportunities for the development of Ag-specific cancer vaccines. Despite extensive work, however, the number of CTL-defined tumor Ags that are suitable targets for the vaccination of cancer patients is still limited, especially because of the laborious and time consuming nature of the procedures currentiy used for their identification. The use of combinatorial peptide libraries in positionai scanning format (Positional Scanning Synthetic Combinatorial Libraries, PS-SCL)' has recently been proposed as an alternative approach for the identification of these epitopes. To validate this approach, we analyzed in detail the recognition of PS-SCL by tumor-reactive CTL clones specific for multiple well-defined tumor-associated Ags (TAA) as well as by tumor-reactive CTL clones of unknown specificity. The results of these analyses revealed that for all the TAA-specific clones studied most of the amino acids composing the native antigenic peptide sequences could be identified through the use of PS-SCL. Based on the data obtained from the screening of PS-SCL, we could design peptide analogs of increased antigenicity as well as cross-reactive analog peptides containing multiple amino acid substitutions. In addition, the resuits of PS-SCL-screening combined with a recently developed biometric data analysis (PS-SCL-based biometric database analysis) allowed the identification of the native peptides in public protein databases among the 30 most active sequences, and this was the case for all the TAA studied. More importantiy, the screening of PS- SCL with a tumor-reactive CTL clone of unknown specificity resulted in the identification of the actual epitope. Overall, these data encourage the use of PS-SCL not oniy for the identification and optimization of tumor-associated CTL epitopes, but also for the analysis of degeneracy in T lymphocyte receptor (TCR) recognition of tumor Ags.<br/><br/>Les cellules T CD8? cytolytiques font partie des globules blancs du sang et sont les principales responsables de la lutte contre les infections et les tumeurs. Les immunologistes cherchent depuis des années à identifier des molécules exprimées et présentées à la surface des tumeurs qui puissent être reconnues par des cellules T CD8? cytolytiques capables ensuite de tuer ces tumeurs de façon spécifique. Ce type de molécules représente la base pour le développement de vaccins contre le cancer puisqu?elles pourraient être injectées aux patients afin d?induire une réponse anti- tumorale. A présent, il y a très peu de molécules capables de stimuler le système immunitaire contre les tumeurs qui sont connues parce que les techniques développées à ce jour pour leur identification sont complexes et longues. Une nouvelle technique vient d?être proposée pour l?identification de ce type de molécules qui se base sur l?utilisation de librairies de peptides. Ces librairies représentent toutes les combinaisons possibles des composants de base des molécules recherchées. La première partie de cette étude a consisté à valider cette nouvelle technique en utilisant des cellules T CD8? cytolytiques capables de tuer des cellules tumorales en reconnaissant une molécule connue présente à leur surface. On a démontré que l?utilisation des librairies permet d?identifier la plupart des composants de base de la molécule reconnue par les cellules T CD8? cytolytiques utilisées. La deuxième partie de cette étude a consisté à effectuer une recherche des molécules potentiellement actives dans des protéines présentes dans des bases des données en utilisant un programme informatique qui permet de classer les molécules sur la base de leur activité biologique. Parmi des milliers de molécules de la base de données, celles reconnues par nos cellules T CD8? cytolytiques ont été trouvées parmi les plus actives. Plus intéressant encore, la combinaison de ces deux techniques nous a permis d?identifier la molécule reconnue par une population de cellules T CD8? cytolytiques ayant une activité anti-tumorale, mais pour laquelle on ne connaissait pas la spécificité. Nos résultats encouragent l?utilisation des librairies pour trouver et optimiser des molécules reconnues spécifiquement par des cellules T CD8? cytolytiques capables de tuer des tumeurs.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
In protein databases there is a substantial number of proteins structurally determined but without function annotation. Understanding the relationship between function and structure can be useful to predict function on a large scale. We have analyzed the similarities in global physicochemical parameters for a set of enzymes which were classified according to the four Enzyme Commission (EC) hierarchical levels. Using relevance theory we introduced a distance between proteins in the space of physicochemical characteristics. This was done by minimizing a cost function of the metric tensor built to reflect the EC classification system. Using an unsupervised clustering method on a set of 1025 enzymes, we obtained no relevant clustering formation compatible with EC classification. The distance distributions between enzymes from the same EC group and from different EC groups were compared by histograms. Such analysis was also performed using sequence alignment similarity as a distance. Our results suggest that global structure parameters are not sufficient to segregate enzymes according to EC hierarchy. This indicates that features essential for function are rather local than global. Consequently, methods for predicting function based on global attributes should not obtain high accuracy in main EC classes prediction without relying on similarities between enzymes from training and validation datasets. Furthermore, these results are consistent with a substantial number of studies suggesting that function evolves fundamentally by recruitment, i.e., a same protein motif or fold can be used to perform different enzymatic functions and a few specific amino acids (AAs) are actually responsible for enzyme activity. These essential amino acids should belong to active sites and an effective method for predicting function should be able to recognize them. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
A complete reference genome of the Apis mellifera Filamentous virus (AmFV) was determined using Illumina Hiseq sequencing. The AmFV genome is a double stranded DNA molecule of approximately 498,500 nucleotides with a GC content of 50.8%. It encompasses 247 non-overlapping open reading frames (ORFs), equally distributed on both strands, which cover 65% of the genome. While most of the ORFs lacked threshold sequence alignments to reference protein databases, twenty-eight were found to display significant homologies with proteins present in other large double stranded DNA viruses. Remarkably, 13 ORFs had strong similarity with typical baculovirus domains such as PIFs (per os infectivity factor genes: pif-1, pif-2, pif-3 and p74) and BRO (Baculovirus Repeated Open Reading Frame). The putative AmFV DNA polymerase is of type B, but is only distantly related to those of the baculoviruses. The ORFs encoding proteins involved in nucleotide metabolism had the highest percent identity to viral proteins in GenBank. Other notable features include the presence of several collagen-like, chitin-binding, kinesin and pacifastin domains. Due to the large size of the AmFV genome and the inconsistent affiliation with other large double stranded DNA virus families infecting invertebrates, AmFV may belong to a new virus family.
Resumo:
For a large number of T cell-mediated immunopathologies, the disease-related antigens are not yet identified. Identification of T cell epitopes is of crucial importance for the development of immune-intervention strategies. We show that CD4+ T cell epitopes can be defined by using a new system for synthesis and screening of synthetic peptide libraries. These libraries are designed to bind to the HLA class II restriction molecule of the CD4+ T cell clone of interest. The screening is based on three selection rounds using partial release of 14-mer peptides from synthesis beads and subsequent sequencing of the remaining peptide attached to the bead. With this approach, two peptides were identified that stimulate the β cell-reactive CD4+ T cell clone 1c10, which was isolated from a newly diagnosed insulin-dependent diabetes mellitus patient. After performing amino acid-substitution studies and protein database searches, a Haemophilus influenzae TonB-derived peptide was identified that stimulates clone 1c10. The relevance of this finding for the pathogenesis of insulin-dependent diabetes mellitus is currently under investigation. We conclude that this system is capable of determining epitopes for (autoreactive) CD4+ T cell clones with previously unknown peptide specificity. This offers the possibility to define (auto)antigens by searching protein databases and/or to induce tolerance by using the peptide sequences identified. In addition the peptides might be used as leads to develop T cell receptor antagonists or anergy-inducing compounds.
Resumo:
C2-α-Mannosyltryptophan was discovered in human RNase 2, an enzyme that occurs in eosinophils and is involved in host defense. It represents a novel way of attaching carbohydrate to a protein in addition to the well-known N- and O-glycosylations. The reaction is specific, as in RNase 2 Trp-7, but never Trp-10, which is modified. In this article, we address which structural features provide the specificity of the reaction. Expression of chimeras of RNase 2 and nonglycosylated RNase 4 and deletion mutants in HEK293 cells identified residues 1–13 to be sufficient for C-mannosylation. Site-directed mutagenesis revealed the sequence Trp-x-x-Trp, in which the first Trp becomes mannosylated, as the specificity determinant. The Trp residue at position +3 can be replaced by Phe, which reduces the efficiency of the reaction threefold. Interpretation of the data in the context of the three-dimensional structure of RNase 2 strongly suggests that the primary, rather than the tertiary, structure forms the determinant. The sequence motif occurs in 336 mammalian proteins currently present in protein databases. Two of these proteins were analyzed protein chemically, which showed partial C-glycosylation of recombinant human interleukin 12. The frequent occurrence of the protein recognition motif suggests that C-glycosides could be part of the structure of more proteins than assumed so far.
Resumo:
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
Resumo:
High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits. isb-sib.ch).
Resumo:
DBMODELING is a relational database of annotated comparative protein structure models and their metabolic, pathway characterization. It is focused on enzymes identified in the genomes of Mycobacterium tuberculosis and Xylella fastidiosa. The main goal of the present database is to provide structural models to be used in docking simulations and drug design. However, since the accuracy of structural models is highly dependent on sequence identity between template and target, it is necessary to make clear to the user that only models which show high structural quality should be used in such efforts. Molecular modeling of these genomes generated a database, in which all structural models were built using alignments presenting more than 30% of sequence identity, generating models with medium and high accuracy. All models in the database are publicly accessible at http://www.biocristalografia.df.ibilce.unesp.br/tools. DBMODELING user interface provides users friendly menus, so that all information can be printed in one stop from any web browser. Furthermore, DBMODELING also provides a docking interface, which allows the user to carry out geometric docking simulation, against the molecular models available in the database. There are three other important homology model databases: MODBASE, SWISSMODEL, and GTOP. The main applications of these databases are described in the present article. © 2007 Bentham Science Publishers Ltd.
Resumo:
High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits.isb-sib.ch).
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.