74 resultados para Ontology Alignments Negotiations
Resumo:
The Microbe browser is a web server providing comparative microbial genomics data. It offers comprehensive, integrated data from GenBank, RefSeq, UniProt, InterPro, Gene Ontology and the Orthologs Matrix Project (OMA) database, displayed along with gene predictions from five software packages. The Microbe browser is daily updated from the source databases and includes all completely sequenced bacterial and archaeal genomes. The data are displayed in an easy-to-use, interactive website based on Ensembl software. The Microbe browser is available at http://microbe.vital-it.ch/. Programmatic access is available through the OMA application programming interface (API) at http://microbe.vital-it.ch/api.
Resumo:
Recent technological progress has greatly facilitated de novo genome sequencing. However, de novo assemblies consist in many pieces of contiguous sequence (contigs) arranged in thousands of scaffolds instead of small numbers of chromosomes. Confirming and improving the quality of such assemblies is critical for subsequent analysis. We present a method to evaluate genome scaffolding by aligning independently obtained transcriptome sequences to the genome and visually summarizing the alignments using the Cytoscape software. Applying this method to the genome of the red fire ant Solenopsis invicta allowed us to identify inconsistencies in 7%, confirm contig order in 20% and extend 16% of scaffolds.Scripts that generate tables for visualization in Cytoscape from FASTA sequence and scaffolding information files are publicly available at https://github.com/ksanao/TGNet.
Resumo:
In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10(-9)) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10(-4)-2.2 × 10(-7). Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.
Resumo:
Erythrocyte concentrates (ECs) are the major labile blood product being transfused worldwide, aiming at curing anemia of diverse origins. In Switzerland, ECs are stored at 4 °C up to 42 days in saline-adenine-glucose-mannitol (SAGM). Such storage induces cellular lesions, altering red blood cells (RBCs) metabolism, protein content and rheological properties. A hot debate exists regarding the impact of the storage lesions, thus the age of ECs on transfusion-related clinical adverse outcomes. Several studies tend to show that poorer outcomes occur in patients receiving older blood products. However, no clear association was demonstrated up to date. While metabolism and early rheological changes are reversible through transfusion of the blood units, oxidized proteins cannot be repaired, and it is likely such irreversible damages would affect the quality of the blood product and the efficiency of the transfusion. In vivo, RBCs are constantly exposed to oxygen fluxes, and are thus well equipped to deal with oxidative challenges. Moreover, functional 20S proteasome complexes allow for recognition and proteolysis of fairly oxidized protein, and some proteins can be eliminated from RBCs by the release of microvesicles. The present PhD thesis is involved in a global research project which goal is to characterize the effect of processing and storage on the quality of ECs. Assessing protein oxidative damages during RBC storage is of major importance to understand the mechanisms of aging of stored RBCs. To this purpose, redox proteomic-based investigations were conducted here. In a first part, cysteine oxidation and protein carbonylation were addressed via 2D-DIGE and derivatization-driven immunodetection approaches, respectively. Then, the oxidized sub- proteomes were characterized through LC-MS/MS identification of proteins in spots of interest (cysteine oxidation) or affinity-purified carbonylated proteins. Gene ontology annotation allowed classifying targets of oxidation according to their molecular functions. In a third part, the P20S activity was evaluated throughout the storage period of ECs, and its susceptibility to highly oxidized environment was investigated. The potential defensive role of microvesiculation was also addressed through the quantification of eliminated carbonylated proteins. We highlighted distinct protein groups differentially affected by cysteine oxidation, either reversibly or irreversibly. In addition, soluble extracts showed a decrease in carbonylation at the beginning of the storage and membrane extracts revealed increasing carbonylation after 4 weeks of storage. Engaged molecular functions revealed that antioxidant (AO) are rather reversibly oxidized at their cysteine residue(s), but are irreversibly oxidized through carbonylation. In the meantime, the 20S proteasome activity is decreased by around 40 % at the end of the storage period. Incubation of fresh RBCs extracts with exogenous oxidized proteins showed a dose-dependent and protein-dependent inhibitory effect. Finally, we proved that the release of microvesicles allows the elimination of increasing quantities of carbonylated proteins. Taken together, these results revealed an oxidative pathway model of RBCs storage, on which further investigation towards improved storage conditions will be based. -- Les concentrés érythrocytaires (CE) sont le produit sanguin le plus délivré au monde, permettant de traiter différentes formes d'anémies. En Suisse, les CE sont stocké à 4 °C pendant 42 jours dans une solution saline d'adénine, glucose et mannitol (SAGM). Une telle conservation induit des lésions de stockage qui altèrent le métabolisme, les protéines et les propriétés rhéologique du globule rouge (GR). Un débat important concerne l'impact du temps de stockage des CE sur les risques de réaction transfusionnelles, certaines études tentant de démontrer que des transfusions de sang vieux réduiraient l'espérance de vie des patients. Cependant, aucune association concrète n'a été prouvée à ce jour. Alors que les modifications du métabolisme et changement précoces des propriétés rhéologiques sont réversibles suite à la transfusion du CE, les protéines oxydées ne peuvent être réparées, et il est probable que de telles lésions affectent la qualité et l'efficacité des produits sanguins. In vivo, les GR sont constamment exposés à l'oxygène, et sont donc bien équipés pour résister aux lésions oxydatives. De plus, les complexes fonctionnels de proteasome 20S reconnaissent et dégradent les protéines modérément oxydées, et certaines protéines peuvent être éliminées par les microparticules. Cette thèse de doctorat est imbriquée dans un projet de recherche global ayant pour objectif la caractérisation des effets de la préparation et du stockage sur la qualité des GR. Evaluer les dommages oxydatifs du GR pendant le stockage est primordial pour comprendre les mécanismes de vieillissement des produits sanguin. Dans ce but, des recherches orientées redoxomique ont été conduites. Dans une première partie, l'oxydation des cystéines et la carbonylation des protéines sont évaluées par électrophorèse bidimensionnelle différentielle et par immunodétection de protéines dérivatisées. Ensuite, les protéines d'intérêt ainsi que les protéines carbonylées, purifiées par affinité, sont identifiées par spectrométrie de masse en tandem. Les protéines cibles de l'oxydation sont classées selon leur fonction moléculaire. Dans une troisième partie, l'activité protéolytique du protéasome 20S est suivie durant la période de stockage. L'impact du stress oxydant sur cette activité a été évalué en utilisant des protéines exogènes oxydées in vitro. Le potentiel rôle défensif de la microvesiculation a également été étudié par la quantification des protéines carbonylées éliminées. Dans ce travail, nous avons observé que différents groupes de protéines sont affectés par l'oxydation réversible ou irréversible de leurs cystéines. De plus, une diminution de la carbonylation en début de stockage dans les extraits solubles et une augmentation de la carbonylation après 4 semaines dans les extraits membranaires ont été montrées. Les fonctions moléculaires engagées par les protéines altérées montrent que les défenses antioxydantes sont oxydées de façon réversible sur leurs résidus cystéines, mais sont également irréversiblement carbonylées. Pendant ce temps, l'activité protéolytique du protéasome 20S décroit de 40 % en fin de stockage. L'incubation d'extraits de GR en début de stockage avec des protéines oxydées exogènes montre un effet inhibiteur « dose-dépendant » et « protéine-dépendant ». Enfin, les microvésicules s'avèrent éliminer des quantités croissantes de protéines carbonylées. La synthèse de ces résultats permet de modéliser une voie oxydative du stockage des GRs, à partir de laquelle de futures recherches seront menées avec pour but l'amélioration des conditions de stockage.
Resumo:
The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360,000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.
Resumo:
Peptide toxins synthesized by venomous animals have been extensively studied in the last decades. To be useful to the scientific community, this knowledge has been stored, annotated and made easy to retrieve by several databases. The aim of this article is to present what type of information users can access from each database. ArachnoServer and ConoServer focus on spider toxins and cone snail toxins, respectively. UniProtKB, a generalist protein knowledgebase, has an animal toxin-dedicated annotation program that includes toxins from all venomous animals. Finally, the ATDB metadatabase compiles data and annotations from other databases and provides toxin ontology.
Resumo:
BACKGROUND: Fourmidable is an infrastructure to curate and share the emerging genetic, molecular, and functional genomic data and protocols for ants. DESCRIPTION: The Fourmidable assembly pipeline groups nucleotide sequences into clusters before independently assembling each cluster. Subsequently, assembled sequences are annotated via Interproscan and BLAST against general and insect-specific databases. Gene-specific information can be retrieved using gene identifiers, searching for similar sequences or browsing through inferred Gene Ontology annotations. The database will readily scale as ultra-high throughput sequence data and sequences from additional species become available. CONCLUSION: Fourmidable currently houses EST data from two ant species and microarray gene expression data for one of these. Fourmidable is publicly available at http://fourmidable.unil.ch.
Resumo:
During my PhD, my aim was to provide new tools to increase our capacity to analyse gene expression patterns, and to study on a large-scale basis the evolution of gene expression in animals. Gene expression patterns (when and where a gene is expressed) are a key feature in understanding gene function, notably in development. It appears clear now that the evolution of developmental processes and of phenotypes is shaped both by evolution at the coding sequence level, and at the gene expression level.Studying gene expression evolution in animals, with complex expression patterns over tissues and developmental time, is still challenging. No tools are available to routinely compare expression patterns between different species, with precision, and on a large-scale basis. Studies on gene expression evolution are therefore performed only on small genes datasets, or using imprecise descriptions of expression patterns.The aim of my PhD was thus to develop and use novel bioinformatics resources, to study the evolution of gene expression. To this end, I developed the database Bgee (Base for Gene Expression Evolution). The approach of Bgee is to transform heterogeneous expression data (ESTs, microarrays, and in-situ hybridizations) into present/absent calls, and to annotate them to standard representations of anatomy and development of different species (anatomical ontologies). An extensive mapping between anatomies of species is then developed based on hypothesis of homology. These precise annotations to anatomies, and this extensive mapping between species, are the major assets of Bgee, and have required the involvement of many co-workers over the years. My main personal contribution is the development and the management of both the Bgee database and the web-application.Bgee is now on its ninth release, and includes an important gene expression dataset for 5 species (human, mouse, drosophila, zebrafish, Xenopus), with the most data from mouse, human and zebrafish. Using these three species, I have conducted an analysis of gene expression evolution after duplication in vertebrates.Gene duplication is thought to be a major source of novelty in evolution, and to participate to speciation. It has been suggested that the evolution of gene expression patterns might participate in the retention of duplicate genes. I performed a large-scale comparison of expression patterns of hundreds of duplicated genes to their singleton ortholog in an outgroup, including both small and large-scale duplicates, in three vertebrate species (human, mouse and zebrafish), and using highly accurate descriptions of expression patterns. My results showed unexpectedly high rates of de novo acquisition of expression domains after duplication (neofunctionalization), at least as high or higher than rates of partitioning of expression domains (subfunctionalization). I found differences in the evolution of expression of small- and large-scale duplicates, with small-scale duplicates more prone to neofunctionalization. Duplicates with neofunctionalization seemed to evolve under more relaxed selective pressure on the coding sequence. Finally, even with abundant and precise expression data, the majority fate I recovered was neither neo- nor subfunctionalization of expression domains, suggesting a major role for other mechanisms in duplicate gene retention.
Resumo:
Peptide toxins synthesized by venomous animals have been extensively studied in the last decades. To be useful to the scientific community, this knowledge has been stored, annotated and made easy to retrieve by several databases. The aim of this article is to present what type of information users can access from each database. ArachnoServer and ConoServer focus on spider toxins and cone snail toxins, respectively. UniProtKB, a generalist protein knowledgebase, has an animal toxin-dedicated annotation program that includes toxins from all venomous animals. Finally, the ATDB metadatabase compiles data and annotations from other databases and provides toxin ontology.
Resumo:
ABSTRACT: BACKGROUND: It is accepted that a woman's lifetime risk of developing breast cancer after menopause is reduced by early full term pregnancy and multiparity. This phenomenon is thought to be associated with the development and differentiation of the breast during pregnancy. METHODS: In order to understand the underlying molecular mechanisms of pregnancy induced breast cancer protection, we profiled and compared the transcriptomes of normal breast tissue biopsies from 71 parous (P) and 42 nulliparous (NP) healthy postmenopausal women using Affymetrix Human Genome U133 Plus 2.0 arrays. To validate the results, we performed real time PCR and immunohistochemistry. RESULTS: We identified 305 differentially expressed probesets (208 distinct genes). Of these, 267 probesets were up- and 38 down-regulated in parous breast samples; bioinformatics analysis using gene ontology enrichment revealed that up-regulated genes in the parous breast represented biological processes involving differentiation and development, anchoring of epithelial cells to the basement membrane, hemidesmosome and cell-substrate junction assembly, mRNA and RNA metabolic processes and RNA splicing machinery. The down-regulated genes represented biological processes that comprised cell proliferation, regulation of IGF-like growth factor receptor signaling, somatic stem cell maintenance, muscle cell differentiation and apoptosis. CONCLUSIONS: This study suggests that the differentiation of the breast imprints a genomic signature that is centered in the mRNA processing reactome. These findings indicate that pregnancy may induce a safeguard mechanism at post-transcriptional level that maintains the fidelity of the transcriptional process.
Resumo:
Centrifuge is a user-friendly system to simultaneously access Arabidopsis gene annotations and intra- and inter-organism sequence comparison data. The tool allows rapid retrieval of user-selected data for each annotated Arabidopsis gene providing, in any combination, data on the following features: predicted protein properties such as mass, pI, cellular location and transmembrane domains; SWISS-PROT annotations; Interpro domains; Gene Ontology records; verified transcription; BLAST matches to the proteomes of A.thaliana, Oryza sativa (rice), Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. The tool lends itself particularly well to the rapid analysis of contigs or of tens or hundreds of genes identified by high-throughput gene expression experiments. In these cases, a summary table of principal predicted protein features for all genes is given followed by more detailed reports for each individual gene. Centrifuge can also be used for single gene analysis or in a word search mode. AVAILABILITY: http://centrifuge.unil.ch/ CONTACT: edward.farmer@unil.ch.
Resumo:
Genome-wide association studies (GWAS) are designed to identify the portion of single-nucleotide polymorphisms (SNPs) in genome sequences associated with a complex trait. Strategies based on the gene list enrichment concept are currently applied for the functional analysis of GWAS, according to which a significant overrepresentation of candidate genes associated with a biological pathway is used as a proxy to infer overrepresentation of candidate SNPs in the pathway. Here we show that such inference is not always valid and introduce the program SNP2GO, which implements a new method to properly test for the overrepresentation of candidate SNPs in biological pathways.
Resumo:
Selectome (http://selectome.unil.ch/) is a database of positive selection, based on a branch-site likelihood test. This model estimates the number of nonsynonymous substitutions (dN) and synonymous substitutions (dS) to evaluate the variation in selective pressure (dN/dS ratio) over branches and over sites. Since the original release of Selectome, we have benchmarked and implemented a thorough quality control procedure on multiple sequence alignments, aiming to provide minimum false-positive results. We have also improved the computational efficiency of the branch-site test implementation, allowing larger data sets and more frequent updates. Release 6 of Selectome includes all gene trees from Ensembl for Primates and Glires, as well as a large set of vertebrate gene trees. A total of 6810 gene trees have some evidence of positive selection. Finally, the web interface has been improved to be more responsive and to facilitate searches and browsing.
Resumo:
We previously introduced two new protein databases (trEST and trGEN) of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Here, we present the updates made on these two databases plus a new database (trome), which uses alignments of EST data to HTG or full genomes to generate virtual transcripts and coding sequences. This new database is of higher quality and since it contains the information in a much denser format it is of much smaller size. These new databases are in a Swiss-Prot-like format and are updated on a weekly basis (trEST and trGEN) or every 3 months (trome). They can be downloaded by anonymous ftp from ftp://ftp.isrec.isb-sib.ch/pub/databases.
Resumo:
BACKGROUND: Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data. RESULTS: As part of the GENCODE annotation of the human genome, we present the first genome-wide pseudogene assignment for protein-coding genes, based on both large-scale manual annotation and in silico pipelines. A key aspect of this coupled approach is that it allows us to identify pseudogenes in an unbiased fashion as well as untangle complex events through manual evaluation. We integrate the pseudogene annotations with the extensive ENCODE functional genomics information. In particular, we determine the expression level, transcription-factor and RNA polymerase II binding, and chromatin marks associated with each pseudogene. Based on their distribution, we develop simple statistical models for each type of activity, which we validate with large-scale RT-PCR-Seq experiments. Finally, we compare our pseudogenes with conservation and variation data from primate alignments and the 1000 Genomes project, producing lists of pseudogenes potentially under selection. CONCLUSIONS: At one extreme, some pseudogenes possess conventional characteristics of functionality; these may represent genes that have recently died. On the other hand, we find interesting patterns of partial activity, which may suggest that dead genes are being resurrected as functioning non-coding RNAs. The activity data of each pseudogene are stored in an associated resource, psiDR, which will be useful for the initial identification of potentially functional pseudogenes.