877 resultados para annotation sémantique
Resumo:
Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit - a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org/
Resumo:
BACKGROUND: Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data. RESULTS: As part of the GENCODE annotation of the human genome, we present the first genome-wide pseudogene assignment for protein-coding genes, based on both large-scale manual annotation and in silico pipelines. A key aspect of this coupled approach is that it allows us to identify pseudogenes in an unbiased fashion as well as untangle complex events through manual evaluation. We integrate the pseudogene annotations with the extensive ENCODE functional genomics information. In particular, we determine the expression level, transcription-factor and RNA polymerase II binding, and chromatin marks associated with each pseudogene. Based on their distribution, we develop simple statistical models for each type of activity, which we validate with large-scale RT-PCR-Seq experiments. Finally, we compare our pseudogenes with conservation and variation data from primate alignments and the 1000 Genomes project, producing lists of pseudogenes potentially under selection. CONCLUSIONS: At one extreme, some pseudogenes possess conventional characteristics of functionality; these may represent genes that have recently died. On the other hand, we find interesting patterns of partial activity, which may suggest that dead genes are being resurrected as functioning non-coding RNAs. The activity data of each pseudogene are stored in an associated resource, psiDR, which will be useful for the initial identification of potentially functional pseudogenes.
Resumo:
The broad aim of biomedical science in the postgenomic era is to link genomic and phenotype information to allow deeper understanding of the processes leading from genomic changes to altered phenotype and disease. The EuroPhenome project (http://www.EuroPhenome.org) is a comprehensive resource for raw and annotated high-throughput phenotyping data arising from projects such as EUMODIC. EUMODIC is gathering data from the EMPReSSslim pipeline (http://www.empress.har.mrc.ac.uk/) which is performed on inbred mouse strains and knock-out lines arising from the EUCOMM project. The EuroPhenome interface allows the user to access the data via the phenotype or genotype. It also allows the user to access the data in a variety of ways, including graphical display, statistical analysis and access to the raw data via web services. The raw phenotyping data captured in EuroPhenome is annotated by an annotation pipeline which automatically identifies statistically different mutants from the appropriate baseline and assigns ontology terms for that specific test. Mutant phenotypes can be quickly identified using two EuroPhenome tools: PhenoMap, a graphical representation of statistically relevant phenotypes, and mining for a mutant using ontology terms. To assist with data definition and cross-database comparisons, phenotype data is annotated using combinations of terms from biological ontologies.
Differences in the evolutionary history of disease genes affected by dominant or recessive mutations
Resumo:
Background: Global analyses of human disease genes by computational methods have yielded important advances in the understanding of human diseases. Generally these studies have treated the group of disease genes uniformly, thus ignoring the type of disease-causing mutations (dominant or recessive). In this report we present a comprehensive study of the evolutionary history of autosomal disease genes separated by mode of inheritance.Results: We examine differences in protein and coding sequence conservation between dominant and recessive human disease genes. Our analysis shows that disease genes affected by dominant mutations are more conserved than those affected by recessive mutations. This could be a consequence of the fact that recessive mutations remain hidden from selection while heterozygous. Furthermore, we employ functional annotation analysis and investigations into disease severity to support this hypothesis. Conclusion: This study elucidates important differences between dominantly- and recessively-acting disease genes in terms of protein and DNA sequence conservation, paralogy and essentiality. We propose that the division of disease genes by mode of inheritance will enhance both understanding of the disease process and prediction of candidate disease genes in the future.
Resumo:
Drug safety issues pose serious health threats to the population and constitute a major cause of mortality worldwide. Due to the prominent implications to both public health and the pharmaceutical industry, it is of great importance to unravel the molecular mechanisms by which an adverse drug reaction can be potentially elicited. These mechanisms can be investigated by placing the pharmaco-epidemiologically detected adverse drug reaction in an information-rich context and by exploiting all currently available biomedical knowledge to substantiate it. We present a computational framework for the biological annotation of potential adverse drug reactions. First, the proposed framework investigates previous evidences on the drug-event association in the context of biomedical literature (signal filtering). Then, it seeks to provide a biological explanation (signal substantiation) by exploring mechanistic connections that might explain why a drug produces a specific adverse reaction. The mechanistic connections include the activity of the drug, related compounds and drug metabolites on protein targets, the association of protein targets to clinical events, and the annotation of proteins (both protein targets and proteins associated with clinical events) to biological pathways. Hence, the workflows for signal filtering and substantiation integrate modules for literature and database mining, in silico drug-target profiling, and analyses based on gene-disease networks and biological pathways. Application examples of these workflows carried out on selected cases of drug safety signals are discussed. The methodology and workflows presented offer a novel approach to explore the molecular mechanisms underlying adverse drug reactions
Resumo:
AbstractBACKGROUND: Scientists have been trying to understand the molecular mechanisms of diseases to design preventive and therapeutic strategies for a long time. For some diseases, it has become evident that it is not enough to obtain a catalogue of the disease-related genes but to uncover how disruptions of molecular networks in the cell give rise to disease phenotypes. Moreover, with the unprecedented wealth of information available, even obtaining such catalogue is extremely difficult.PRINCIPAL FINDINGS: We developed a comprehensive gene-disease association database by integrating associations from several sources that cover different biomedical aspects of diseases. In particular, we focus on the current knowledge of human genetic diseases including mendelian, complex and environmental diseases. To assess the concept of modularity of human diseases, we performed a systematic study of the emergent properties of human gene-disease networks by means of network topology and functional annotation analysis. The results indicate a highly shared genetic origin of human diseases and show that for most diseases, including mendelian, complex and environmental diseases, functional modules exist. Moreover, a core set of biological pathways is found to be associated with most human diseases. We obtained similar results when studying clusters of diseases, suggesting that related diseases might arise due to dysfunction of common biological processes in the cell.CONCLUSIONS: For the first time, we include mendelian, complex and environmental diseases in an integrated gene-disease association database and show that the concept of modularity applies for all of them. We furthermore provide a functional analysis of disease-related modules providing important new biological insights, which might not be discovered when considering each of the gene-disease association repositories independently. Hence, we present a suitable framework for the study of how genetic and environmental factors, such as drugs, contribute to diseases.AVAILABILITY: The gene-disease networks used in this study and part of the analysis are available at http://ibi.imim.es/DisGeNET/DisGeNETweb.html#Download
Resumo:
Cells respond to different kind of stress through the coordinated activation of signaling pathways such as MAPK or p53. To find which molecular mechanisms are involved, we need to understand their cell adaptation. The ribosomal protein, S6 kinase 1 (S6K1), is a common downstream target of signaling by hormonal or nutritional stress. Here, we investigated the initial contribution of S6K1/MAPK signaling pathways in the cell response to oxidative stress produced by hydrogen peroxide (H2O2). To analyze S6K1 activation, we used the commercial anti-phospho-Thr389-S6K1 antibody most frequently mentioned in the bibliography. We found that this antibody detected an 80-90 kDa protein that was rapidly phosphorylated in response to H2O2 in several human cells. Unexpectedly, this phosphorylation was insensitive to both mTOR and PI3K inhibitors, and knock-down experiments showed that this protein was not S6K1. RSK and MSK proteins were candidate targets of this phosphorylation. We demonstrated that H2O2 stimulated phosphorylation of RSK and MSK kinases at residues that are homologous to Thr389 in S6K1. This phosphorylation required the activity of either p38 or ERK MAP kinases. Kinase assays showed activation of RSK and MSK by H2O2. Experiments with mouse embryonic fibroblasts from p38 animals" knockout confirmed these observations. Altogether, these findings show that the S6K1 signaling pathway is not activated under these conditions, clarify previous observations probably misinterpreted by non-specific detection of proteins RSK and MSK by the anti-phospho-Thr389-S6K1 antibody, and demonstrate the specific activation of MAPK signaling pathways through ERK/p38/RSK/MSK by H2O2.
Resumo:
Next-generation sequencing techniques such as exome sequencing can successfully detect all genetic variants in a human exome and it has been useful together with the implementation of variant filters to identify causing-disease mutations. Two filters aremainly used for the mutations identification: low allele frequency and the computational annotation of the genetic variant. Bioinformatic tools to predict the effect of a givenvariant may have errors due to the existing bias in databases and sometimes show a limited coincidence among them. Advances in functional and comparative genomics are needed in order to properly annotate these variants.The goal of this study is to: first, functionally annotate Common Variable Immunodeficiency disease (CVID) variants with the available bioinformatic methods in order to assess the reliability of these strategies. Sencondly, as the development of new methods to reduce the number of candidate genetic variants is an active and necessary field of research, we are exploring the utility of gene function information at organism level as a filter for rare disease genes identification. Recently, it has been proposed that only 10-15% of human genes are essential and therefore we would expect that severe rare diseases are mostly caused by mutations on them. Our goal is to determine whether or not these rare and severe diseases are caused by deleterious mutations in these essential genes. If this hypothesis were true, taking into account essential genes as a filter would be an interesting parameter to identify causingdisease mutations.
Resumo:
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyse the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource which uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analysed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarising, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analysed, providing critical insights for the improvement of automatic plagiarism detection systems.
Resumo:
Introduction: L'unité d'Assistance Pharmaceutique de la Pharmacie des HUG fonctionne comme centre d'information sur les médicaments et gère des informations mises à disposition sur le web. Celles-ci sont destinées prioritairement au personnel soignant des HUG et accessibles sur le site intranet/Internet (http://www.hcuge.ch/Pharmacie), mis en service en 1998. L'objectif de ce travail était d'évaluer la qualité de l'information du site intranet/Internet et d'y apporter les améliorations nécessaires. Méthode: Le site intranet/Internet de la pharmacie des HUG a été évalué en automne 2004 à l'aide de 2 outils : NetScoring : grille d'évaluation de la qualité de l'information de santé sur Internet (http://www.chu-rouen.fr/netscoring/). Elle comporte 49 critères répartis en 8 catégories. Chaque critère est noté sur une échelle de 5 occurrences puis pondéré selon son importance (multiplication par 3 si le critère est essentiel, par 2 s'il est important ou par 1 s'il est mineur). Analyse AMDEC : méthode permettant de séquencer un processus et d'en Analyser les Modes de Défaillance, leur Effet et leur Criticité (Qual Saf Health Care 2005 :14(2);93-98). Un score est attribué à chaque mode de défaillance identifié en terme de fréquence, de sévérité et de détectabilité. La multiplication des 3 scores fournit un résultat global de criticité (indice de criticité IC, max. 810), permettant de hiérarchiser les risques. Résultats: Etat des lieux NetScoring : La qualité globale du site intranet/Internet était bonne (202 pts/312). Les points forts concernaient la pertinence et l'utilité du site, la qualité du contenu, du moteur de recherche et du design, la rapidité de chargement du site, la sélection des liens externes proposés et le respect du secret médical. Les faiblesses résidaient dans l'absence de politique de mise à jour régulière, d'annotation systématique de l'état d'actualisation des documents, d'un comité éditorial et scientifique, de mots-clés en anglais et d'une liste permettant l'identification des auteurs. Analyse AMDEC : Quatre catégories (création du document, conversion, structure du site et publication du document) et 19 modes de défaillances ont été caractérisés. Trois modes de défaillance étaient associés à un IC important: erreurs lors de la création d'un document (IC 256), information inadéquate car pratique non validée ou recommandation non généralisable (IC 147) et absence de relecture après la conversion du document en format publiable (ex : PDF) (IC 144). Mesures correctives: Une procédure standard (SOP) a été élaborée pour la gestion du site intranet/Internet. Le format standard des informations (initiales de l'auteur, dates de création et de mise à jour, logo de la pharmacie), la validation et la politique de mise à jour des documents ainsi que la procédure d'archivage y sont clairement définis. Une fiche de suivi accompagnant chaque document a été créée pour la traçabilité de toutes les modifications effectuées et la fréquence de révision à respecter. Discussion et conclusion Cette étude a permis de déterminer et de quantifier les points critiques à améliorer sur le site intranet/Internet de la Pharmacie des HUG. Les mesures correctives entreprises doivent permettre d'améliorer les principales faiblesses et défaillances mises en évidence. La mise en place d'un comité éditorial et scientifique devra être évaluée à l'avenir. Le NetScoring et l'analyse AMDEC sont des outils utiles pour l'évaluation et l'amélioration continue de la qualité d'un site Internet, sous réserve d'une interprétation critique des résultats obtenus avant la mise en place de mesures correctives. Malgré une approche totalement différente, ces outils ont permis de mettre en évidence des lacunes similaires.
Resumo:
Abstract Textual autocorrelation is a broad and pervasive concept, referring to the similarity between nearby textual units: lexical repetitions along consecutive sentences, semantic association between neighbouring lexemes, persistence of discourse types (narrative, descriptive, dialogal...) and so on. Textual autocorrelation can also be negative, as illustrated by alternating phonological or morpho-syntactic categories, or the succession of word lengths. This contribution proposes a general Markov formalism for textual navigation, and inspired by spatial statistics. The formalism can express well-known constructs in textual data analysis, such as term-document matrices, references and hyperlinks navigation, (web) information retrieval, and in particular textual autocorrelation, as measured by Moran's I relatively to the exchange matrix associated to neighbourhoods of various possible types. Four case studies (word lengths alternation, lexical repulsion, parts of speech autocorrelation, and semantic autocorrelation) illustrate the theory. In particular, one observes a short-range repulsion between nouns together with a short-range attraction between verbs, both at the lexical and semantic levels. Résumé: Le concept d'autocorrélation textuelle, fort vaste, réfère à la similarité entre unités textuelles voisines: répétitions lexicales entre phrases successives, association sémantique entre lexèmes voisins, persistance du type de discours (narratif, descriptif, dialogal...) et ainsi de suite. L'autocorrélation textuelle peut être également négative, comme l'illustrent l'alternance entre les catégories phonologiques ou morpho-syntaxiques, ou la succession des longueurs de mots. Cette contribution propose un formalisme markovien général pour la navigation textuelle, inspiré par la statistique spatiale. Le formalisme est capable d'exprimer des constructions bien connues en analyse des données textuelles, telles que les matrices termes-documents, les références et la navigation par hyperliens, la recherche documentaire sur internet, et, en particulier, l'autocorélation textuelle, telle que mesurée par le I de Moran relatif à une matrice d'échange associée à des voisinages de différents types possibles. Quatre cas d'étude illustrent la théorie: alternance des longueurs de mots, répulsion lexicale, autocorrélation des catégories morpho-syntaxiques et autocorrélation sémantique. On observe en particulier une répulsion à courte portée entre les noms, ainsi qu'une attraction à courte portée entre les verbes, tant au niveau lexical que sémantique.
Resumo:
Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.
Resumo:
F. 1-113; Litteralis expositio bibliotece secundum archiepiscopum Canthuariensem ; -- F. 115, Moralitates super historias scolasticas ; -- F. 143-190, Glose et moralitates quorumdam librorum sacre scripture. Manuscrit lacunaire.
Resumo:
Pseudomonas knackmussii B13 was the first strain to be isolated in 1974 that could degrade chlorinated aromatic hydrocarbons. This discovery was the prologue for subsequent characterization of numerous bacterial metabolic pathways, for genetic and biochemical studies, and which spurred ideas for pollutant bioremediation. In this study, we determined the complete genome sequence of B13 using next generation sequencing technologies and optical mapping. Genome annotation indicated that B13 has a variety of metabolic pathways for degrading monoaromatic hydrocarbons including chlorobenzoate, aminophenol, anthranilate and hydroxyquinol, but not polyaromatic compounds. Comparative genome analysis revealed that B13 is closest to Pseudomonas denitrificans and Pseudomonas aeruginosa. The B13 genome contains at least eight genomic islands [prophages and integrative conjugative elements (ICEs)], which were absent in closely related pseudomonads. We confirm that two ICEs are identical copies of the 103 kb self-transmissible element ICEclc that carries the genes for chlorocatechol metabolism. Comparison of ICEclc showed that it is composed of a variable and a 'core' region, which is very conserved among proteobacterial genomes, suggesting a widely distributed family of so far uncharacterized ICE. Resequencing of two spontaneous B13 mutants revealed a number of single nucleotide substitutions, as well as excision of a large 220 kb region and a prophage that drastically change the host metabolic capacity and survivability.
Resumo:
En este trabajo se describe una base de conocimiento de las ALU humanas. La ontología incorpora términos SO y GO y está orientada a describir el contexto genómico del conjunto de ALU. Para cada elemento ALU se almacenan el gen y transcrito más cercanos, así como su anotación funcional de acuerdo a GO, el estado de la cromatina circundante y los factores de transcripción presentes en la ALU. Se han incorporado reglas semánticas para facilitar el almacenamiento, consulta e integración de la información. La ontología de ALU es plenamente analizable mediante razonadores como Pellet y está parcialmente transferida a una wiki semántica.