680 resultados para Annotation
Resumo:
Magdeburg, Univ., Fak. für Informatik, Diss., 2014
Resumo:
Dramatic improvements in DNA sequencing technologies have led to amore than 1,000-fold reduction in sequencing costs over the past five years.Genome-wide research approaches can thus now be applied beyond medicallyrelevant questions to examine the molecular-genetic basis of behavior,development and unique life histories in almost any organism. A first step foran emerging model organism is usually establishing a reference genomesequence. I offer insight gained from the fire ant genome project. First, I detailhow the project came to be and how sequencing, assembly and annotationstrategies were chosen. Subsequently, I describe some of the issues linked toworking with data from recently sequenced genomes. Finally, I discuss anapproach undertaken in a follow-up project based on the fire ant genomesequence.
Resumo:
Résumé Le transfert du phosphate des racines vers les feuilles s'effectue par la voie du xylème. Il a été précédemment démontré que la protéine AtPHO1 était indispensable au transfert du phosphate dans les vaisseaux du xylème des racines chez la plante modèle Arabidopsis thaliana. Le séquençage et l'annotation du génome d'Arabidopsis ont permis d'identifier dix séquences présentant un niveau de similarité significatif avec le gène AtPHO1 et constituant une nouvelle famille de gène appelé la famille de AtPHO1. Basée sur une étude moléculaire et génétique, cette thèse apporte des éléments de réponse pour déterminer le rôle des membres de ia famille de AtPHO1 chez Arabidopsis, inconnue à ce jour. Dans un premier temps, une analyse bioinformatique des séquences protéiques des membres de la famille de AtPHO1 a révélé la présence dans leur région N-terminale d'un domaine nommé SPX. Ce dernier est conservé parmi de nombreuses protéines impliquées dans l'homéostasie du phosphate chez la levure, renforçant ainsi l'hypothèse que les membres de la famille de AtPHO1 auraient comme AtPHO1 un rôle dans l'équilibre du phosphate dans la plante. En parallèle, la localisation tissulaire de l'expression des gènes AtPHO dans Arabidopsis a été identifiée par l'analyse de plantes transgéniques exprimant le gène rapporteur uidA sous le contrôle des promoteurs respectifs des gènes AtPHO. Un profil d'expression de chaque gène AtPHO au cours du développement de la plante a été obtenu. Une expression prédominante au niveau des tissus vasculaires des racines, des feuilles, des tiges et des fleurs a été observée, suggérant que les gènes AtPHO pourraient avoir des fonctions redondantes au niveau du transfert de phosphate dans le cylindre vasculaire de ces différents organes. Toutefois, plusieurs régions promotrices des gènes AtPHO contrôlent également un profil d'expression GUS non-vasculaire, indiquant un rôle putatif des gènes AtPHO dans l'acquisition ou le recyclage de phosphate dans la plante. Dans un deuxième temps, l'analyse de l'expression des gènes AtPHO durant une carence en phosphate a établi que seule l'expression des gènes AtPHO1, AtPHO1; H1 et AtPHO1; H10 est régulée par cette carence. Une étude approfondie de leur expression en réponse à des traitements affectant l'homéostasie du phosphate dans la plante a ensuite démontré leur régulation par différentes voies de signalisation. Ensuite, une analyse détaillée de la régulation de l'expression du gène AtPHO1; H1O dans des feuilles d'Arabidopsis blessées ou déshydratées a révélé que ce gène constitue le premìer gène marqueur d'une nouvelle voie de signalisation induite par l'OPDA, pas par le JA et dépendante de la protéine COI1. Ces résultats démontrent pour la première fois que l'OPDA et le JA peuvent activer différents gènes via des voies de signalisation dépendantes de COI1. Enfin, cette thèse révèle l'identification d'un nouveau rôle de la protéine AtPHO1 dans la régulation de l'action de l'ABA au cours des processus de fermeture stomatique et de germination des graines chez Arabidopsis. Bien que les fonctions exactes des protéines AtPHO restent à être déterminées, ce travail de thèse suggère leur implication dans la propagation de différents signaux dans la plante via la modulation du potentiel membranaire et/ou l'affectation de la composition en ions des cellules comme le font de nombreux transporteurs ou régulateur du transport d'ions. Summary Phosphate is transferred from the roots to the shoot via the xylem. The requirement for AtPHO1 protein to transfer phosphate to the xylem vessels of the root has been previously demonstrated in Arabidopsis thaliana. The sequencing and the annotation of the Arabidopsis genome had allowed the identification of ten sequences that show a significant level of similarity with the AtPHO1 gene. These 10 genes, of unknown functions, constitute a new gene family called the AtPHO1 gene family. Based on a molecular and genetics study, this thesis reveals some information needed to understand the role of the AtPHO1 family members in the plant Arabidopsis. First, a bioinformatics study revealed that the AtPHO sequences contained, in the N-terminal hydrophilic region, a motif called SPX and conserved among multiple proteins involved in phosphate homeostasis in yeast. This finding reinforces the hypothesis that all AtPHO1 family members have, as AtPHO1, a role in phosphate homeostasis. In parallel, we identified the pattern of expression of AtPHO genes in Arabidopsis via analysis of transgenic plants expressing the uidA reporter gene under the control of respective AtPHO promoter regions. The results exhibit a predominant expression of AtPHO genes in vascular tissues of all organs of the plant, implying that these AtPHO genes could have redundant functions in the transfer of phosphate to the vascular cylinder of various organs. The GUS expression pattern for several AtPHO promoter regions was also detected in non-vascular tissue indicating a broad role of AtPHO genes in the acquisition or in the recycling of phosphate in the plant. In a second step, the analysis of the expression of AtPHO genes during phosphate starvation established that only the expression of the AtPHO1, AtPHO1; H1 and AtPHO1; H10 genes were regulated by Pi starvation. Interestingly, different signalling pathways appeared to regulate these three genes during various treatments affecting Pi homeostasis in the plant. The third chapter presents a detailed analysis of the signalling pathways regulating the expression of the AtPHO1; H10 gene in Arabidopsis leaves during wound and dehydrated stresses. Surprisingly, the expression of AtPHO1; H10 was found to be regulated by OPDA (the precursor of JA) but not by JA itself and via the COI1 protein (the central regulator of the JA signalling pathway). These results demonstrated for the first time that OPDA and JA could activate distinct genes via COI1-dependent pathways. Finally, this thesis presents the identification of a novel role of the AtPHO1 protein in the regulation of ABA action in Arabidopsis guard cells and during seed germination. Although the exact role and function of AtPHO1 still need to be determined, these last findings suggest that AtPHO1 and by extension other AtPHO proteins could mediate the propagation of various signals in the plant by modulating the membrane potential and/or by affecting cellular ion composition, as it is the case for many ion transporters or regulators of ion transport.
Resumo:
Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).
Resumo:
En aquest projecte presentem un mètode per generar bases de imatges de vianants, requerides per a l'entrenament o validació de sistemes d'aprenentatge basats en exemples, en un entorn virtual. S'ha desenvolupat una plataforma que permet simular una navegació d'una càmara en una escena virtual i recuperar el fluxe de vídeo amb el seu groundtruth. Amb l'ús d'aquesta plataforma es suprimeix el procés d'anotació, necesari per obtenir el groundtruth en entorns reals, i es redueixen els costos al treballar en un entorn virtual.
Resumo:
2 Abstract2.1 En françaisLe séquençage du génome humain est un pré-requis fondamental à la compréhension de la biologie de l'être humain. Ce projet achevé, les scientifiques ont dû faire face à une tâche aussi importante, comprendre cette suite de 3 milliards de lettres qui compose notre génome. Le consortium ENCODE (ENCyclopedia Of Dna Elements) fût formé comme une suite logique au projet du génome humain. Son rôle est d'identifier tous les éléments fonctionnels de notre génome incluant les régions transcrites, les sites d'attachement des facteurs de transcription, les sites hypersensibles à la DNAse I ainsi que les marqueurs de modification des histones. Dans le cadre de ma thèse doctorale, j'ai participé à 2 sous-projets d'ENCODE. En premier lieu, j'ai eu la tâche de développer et d'optimiser une technique de validation expérimentale à haut rendement de modèles de gènes qui m'a permis d'estimer la qualité de la plus récente annotation manuelle. Ce nouveau processus de validation est bien plus efficace que la technique RNAseq qui est actuellement en train de devenir la norme. Cette technique basée sur la RT-PCR, m'a notamment permis de découvrir de nouveaux exons dans 10% des régions interrogées. En second lieu j'ai participé à une étude ayant pour but d'identifier les extrémités de tous les gènes des chromosomes humains 21 et 22. Cette étude à permis l'identification à large échelle de transcrits chimères comportant des séquences provenant de deux gènes distincts pouvant être à une grande distance l'un de autre.2.2 In EnglishThe completion of the human genome sequence js the prerequisite to fully understand the biology of human beings. This project achieved, scientists had to face another challenging task, understanding the meaning of the 3 billion letters composing this genome. As a logical continuation of the human genome project, the ENCODE (ENCyclopedia Of DNA Elements) consortium was formed with the aim of annotating all its functional elements. These elements include transcribed regions, transcription binding sites, DNAse I hypersensitive sites and histone modification marks. In the frame of my PhD thesis, I was involved in two sub-projects of ENCODE. Firstly I developed and optimized an high throughput method to validate gene models, which allowed me to assess the quality of the most recent manually-curated annotation. This novel experimental validation pipeline is extremely effective, far more so than transcriptome profiling through RNA sequencing, which is becoming the norm. This RT-PCR-seq targeted-approach is likewise particularly efficient in identifying novel exons, as we discovered about 10% of loci with unannotated exons. Secondly, I participated to a study aiming to identify the gene boundaries of all genes in the human chromosome 21 and 22. This study led to the identification of chimeric transcripts that are composed of sequences coming form two distinct genes that can be map far away from each other.
Resumo:
The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes a description of the initiation site mapping data, exhaustive cross-references to the EMBL nucleotide sequence database, SWISS-PROT, TRANSFAC and other databases, as well as bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. WWW-based interfaces have been developed that enable the user to view EPD entries in different formats, to select and extract promoter sequences according to a variety of criteria, and to navigate to related databases exploiting different cross-references. The EPD web site also features yearly updated base frequency matrices for major eukaryotic promoter elements. EPD can be accessed at http://www.epd.isb-sib.ch
Resumo:
Signature databases are vital tools for identifying distant relationships in novel sequences and hence for inferring protein function. InterPro is an integrated documentation resource for protein families, domains and functional sites, which amalgamates the efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. Each InterPro entry includes a functional description, annotation, literature references and links back to the relevant member database(s). Release 2.0 of InterPro (October 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification encoded by a total of 6804 different regular expressions, profiles, fingerprints and Hidden Markov Models. Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1,000,000 hits from 462,500 proteins in SWISS-PROT and TrEMBL). The database is accessible for text- and sequence-based searches at http://www.ebi.ac.uk/interpro/. Questions can be emailed to interhelp@ebi.ac.uk.
Resumo:
In 1927 M. R. James published Latin Infancy Gospels, identified by him in two related but not identical manuscripts (one the British Library Arundel 404; the other from Hereford), together with a parallel text from the Irish manuscript known as the Leabhar Breac. Later researches brought to light more manuscripts of this Latin work, and also of the Irish text. James recognized that his apocryphal Latin Infancy text was compiled from a combination of the Protevangelium of James and a hitherto unknown text which he named "The Source". Recent research has identified a full Latin translation of the Protevangelium of James. A hitherto unrecognized Irish Infancy Narrative has also been identified in the Dublin manuscript known as the Liber Flavus Fergusiorum. A deep study of this related tradition was called for. This has been carried out over the past ten years by an Irish team in conjunction with Professor Daniel Kaestli and AELAC. The fruits of this labour are published in these two volumes. Volume 13 has a general introduction with a historical sketch of New Testament apocrypha in Ireland and a history of research on the subject. This is followed by a comparison of the Infancy Narratives in the Leabhar Breac and the Liber Flavus Fergusiorum. There are special introductions to these Infancy texts, followed by critical editions of the Irish texts, accompanied by English translations and rich annotation. Next there is similar treatment of the Irish versified Narrative (from ca. 700) of the Childhood Deeds of Jesus (commonly known as the Infancy Narrative (or Gospel) of Thomas. There is then (in volume 14, but with continuous pagination) the edition and translation of an Irish thirteenth-century poem with elements from Infancy Narratives, and both Latin and Irish texts on the wonders at Christ's birth, accompanied by translations and notes. The edition of the Irish material is followed by a critical edition of the full Arundel and Hereford forms of the Infancy Narrative (here referred to as the "J Compilation"), together with a detailed study of all the questions relating to this work. The volume concludes with a critical edition (by Rita Beyers) of the Latin text of the Protevangelium of James, accompanied by a detailed study of the work.. The work contains a detailed study of the Latin translations of the Protevangelium of James and the transmission of this work in the West. The "J Compilation" (a combination of the Protevangelium and texts of Pseudo-Matthew) can be traced back in manuscript transmission to ca. 800,and must have originated some time earlier. Behind it stands an earlier "I ("I" for Irish) Compilation" without influence from Pseudo-Matthew, the form found in the Irish witnesses. It is argued that M. R. James's "Source" may be of Judaeo-Christian origin and may really be the Gospel of the Nazoreans. Among the indexes there is a list of all the Irish words found in the texts. This edition of the Irish and related Latin texts is a major contribution to the study of the apocryphal Infancy Narratives. It should also be of particular interest to Celtic scholars, to students of Irish ecclesiastical learning, and in general to all medievalists.
Resumo:
(Résumé de l'ouvrage) In 1927 M. R. James published Latin Infancy Gospels, identified by him in two related but not identical manuscripts (one the British Library Arundel 404; the other from Hereford), together with a parallel text from the Irish manuscript known as the Leabhar Breac. Later researches brought to light more manuscripts of this Latin work, and also of the Irish text. James recognized that his apocryphal Latin Infancy text was compiled from a combination of the Protevangelium of James and a hitherto unknown text which he named "The Source". Recent research has identified a full Latin translation of the Protevangelium of James. A hitherto unrecognized Irish Infancy Narrative has also been identified in the Dublin manuscript known as the Liber Flavus Fergusiorum. A deep study of this related tradition was called for. This has been carried out over the past ten years by an Irish team in conjunction with Professor Daniel Kaestli and AELAC. The fruits of this labour are published in these two volumes. Volume 13 has a general introduction with a historical sketch of New Testament apocrypha in Ireland and a history of research on the subject. This is followed by a comparison of the Infancy Narratives in the Leabhar Breac and the Liber Flavus Fergusiorum. There are special introductions to these Infancy texts, followed by critical editions of the Irish texts, accompanied by English translations and rich annotation. Next there is similar treatment of the Irish versified Narrative (from ca. 700) of the Childhood Deeds of Jesus (commonly known as the Infancy Narrative (or Gospel) of Thomas. There is then (in volume 14, but with continuous pagination) the edition and translation of an Irish thirteenth-century poem with elements from Infancy Narratives, and both Latin and Irish texts on the wonders at Christ's birth, accompanied by translations and notes. The edition of the Irish material is followed by a critical edition of the full Arundel and Hereford forms of the Infancy Narrative (here referred to as the "J Compilation"), together with a detailed study of all the questions relating to this work. The volume concludes with a critical edition (by Rita Beyers) of the Latin text of the Protevangelium of James, accompanied by a detailed study of the work.. The work contains a detailed study of the Latin translations of the Protevangelium of James and the transmission of this work in the West. The "J Compilation" (a combination of the Protevangelium and texts of Pseudo-Matthew) can be traced back in manuscript transmission to ca. 800,and must have originated some time earlier. Behind it stands an earlier "I ("I" for Irish) Compilation" without influence from Pseudo-Matthew, the form found in the Irish witnesses. It is argued that M. R. James's "Source" may be of Judaeo-Christian origin and may really be the Gospel of the Nazoreans. Among the indexes there is a list of all the Irish words found in the texts. This edition of the Irish and related Latin texts is a major contribution to the study of the apocryphal Infancy Narratives. It should also be of particular interest to Celtic scholars, to students of Irish ecclesiastical learning, and in general to all medievalists.
Resumo:
(Résumé de l'ouvrage) In 1927 M. R. James published Latin Infancy Gospels, identified by him in two related but not identical manuscripts (one the British Library Arundel 404; the other from Hereford), together with a parallel text from the Irish manuscript known as the Leabhar Breac. Later researches brought to light more manuscripts of this Latin work, and also of the Irish text. James recognized that his apocryphal Latin Infancy text was compiled from a combination of the Protevangelium of James and a hitherto unknown text which he named "The Source". Recent research has identified a full Latin translation of the Protevangelium of James. A hitherto unrecognized Irish Infancy Narrative has also been identified in the Dublin manuscript known as the Liber Flavus Fergusiorum. A deep study of this related tradition was called for. This has been carried out over the past ten years by an Irish team in conjunction with Professor Daniel Kaestli and AELAC. The fruits of this labour are published in these two volumes. Volume 13 has a general introduction with a historical sketch of New Testament apocrypha in Ireland and a history of research on the subject. This is followed by a comparison of the Infancy Narratives in the Leabhar Breac and the Liber Flavus Fergusiorum. There are special introductions to these Infancy texts, followed by critical editions of the Irish texts, accompanied by English translations and rich annotation. Next there is similar treatment of the Irish versified Narrative (from ca. 700) of the Childhood Deeds of Jesus (commonly known as the Infancy Narrative (or Gospel) of Thomas. There is then (in volume 14, but with continuous pagination) the edition and translation of an Irish thirteenth-century poem with elements from Infancy Narratives, and both Latin and Irish texts on the wonders at Christ's birth, accompanied by translations and notes. The edition of the Irish material is followed by a critical edition of the full Arundel and Hereford forms of the Infancy Narrative (here referred to as the "J Compilation"), together with a detailed study of all the questions relating to this work. The volume concludes with a critical edition (by Rita Beyers) of the Latin text of the Protevangelium of James, accompanied by a detailed study of the work.. The work contains a detailed study of the Latin translations of the Protevangelium of James and the transmission of this work in the West. The "J Compilation" (a combination of the Protevangelium and texts of Pseudo-Matthew) can be traced back in manuscript transmission to ca. 800,and must have originated some time earlier. Behind it stands an earlier "I ("I" for Irish) Compilation" without influence from Pseudo-Matthew, the form found in the Irish witnesses. It is argued that M. R. James's "Source" may be of Judaeo-Christian origin and may really be the Gospel of the Nazoreans. Among the indexes there is a list of all the Irish words found in the texts. This edition of the Irish and related Latin texts is a major contribution to the study of the apocryphal Infancy Narratives. It should also be of particular interest to Celtic scholars, to students of Irish ecclesiastical learning, and in general to all medievalists.
Resumo:
The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.
Resumo:
The identification of all human chromosome 21 (HC21) genes is a necessary step in understanding the molecular pathogenesis of trisomy 21 (Down syndrome). The first analysis of the sequence of 21q included 127 previously characterized genes and predicted an additional 98 novel anonymous genes. Recently we evaluated the quality of this annotation by characterizing a set of HC21 open reading frames (C21orfs) identified by mapping spliced expressed sequence tags (ESTs) and predicted genes (PREDs), identified only in silico. This study underscored the limitations of in silico-only gene prediction, as many PREDs were incorrectly predicted. To refine the HC21 annotation, we have developed a reliable algorithm to extract and stringently map sequences that contain bona fide 3' transcript ends to the genome. We then created a specific 21q graphical display allowing an integrated view of the data that incorporates new ESTs as well as features such as CpG islands, repeats, and gene predictions. Using these tools we identified 27 new putative genes. To validate these, we sequenced previously cloned cDNAs and carried out RT-PCR, 5'- and 3'-RACE procedures, and comparative mapping. These approaches substantiated 19 new transcripts, thus increasing the HC21 gene count by 9.5%. These transcripts were likely not previously identified because they are small and encode small proteins. We also identified four transcriptional units that are spliced but contain no obvious open reading frame. The HC21 data presented here further emphasize that current gene prediction algorithms miss a substantial number of transcripts that nevertheless can be identified using a combination of experimental approaches and multiple refined algorithms.
Resumo:
L'objectiu final d'aquest treball és conèixer i entendre l'anotació semàntica en pàgines web a partir d'un estudi de recerca i d'un cas pràctic.
Resumo:
Ants are some of the most abundant and familiar animals on Earth, and they play vital roles in most terrestrial ecosystems. Although all ants are eusocial, and display a variety of complex and fascinating behaviors, few genomic resources exist for them. Here, we report the draft genome sequence of a particularly widespread and well-studied species, the invasive Argentine ant (Linepithema humile), which was accomplished using a combination of 454 (Roche) and Illumina sequencing and community-based funding rather than federal grant support. Manual annotation of >1,000 genes from a variety of different gene families and functional classes reveals unique features of the Argentine ant's biology, as well as similarities to Apis mellifera and Nasonia vitripennis. Distinctive features of the Argentine ant genome include remarkable expansions of gustatory (116 genes) and odorant receptors (367 genes), an abundance of cytochrome P450 genes (>110), lineage-specific expansions of yellow/major royal jelly proteins and desaturases, and complete CpG DNA methylation and RNAi toolkits. The Argentine ant genome contains fewer immune genes than Drosophila and Tribolium, which may reflect the prominent role played by behavioral and chemical suppression of pathogens. Analysis of the ratio of observed to expected CpG nucleotides for genes in the reproductive development and apoptosis pathways suggests higher levels of methylation than in the genome overall. The resources provided by this genome sequence will offer an abundance of tools for researchers seeking to illuminate the fascinating biology of this emerging model organism.