Biblioteca Digital

83 resultados para Geo-ontology

VIKI : a semiotic-based system for multilingual knowledge retrieval

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Abstract Since its creation, the Internet has permeated our daily life. The web is omnipresent for communication, research and organization. This exploitation has resulted in the rapid development of the Internet. Nowadays, the Internet is the biggest container of resources. Information databases such as Wikipedia, Dmoz and the open data available on the net are a great informational potentiality for mankind. The easy and free web access is one of the major feature characterizing the Internet culture. Ten years earlier, the web was completely dominated by English. Today, the web community is no longer only English speaking but it is becoming a genuinely multilingual community. The availability of content is intertwined with the availability of logical organizations (ontologies) for which multilinguality plays a fundamental role. In this work we introduce a very high-level logical organization fully based on semiotic assumptions. We thus present the theoretical foundations as well as the ontology itself, named Linguistic Meta-Model. The most important feature of Linguistic Meta-Model is its ability to support the representation of different knowledge sources developed according to different underlying semiotic theories. This is possible because mast knowledge representation schemata, either formal or informal, can be put into the context of the so-called semiotic triangle. In order to show the main characteristics of Linguistic Meta-Model from a practical paint of view, we developed VIKI (Virtual Intelligence for Knowledge Induction). VIKI is a work-in-progress system aiming at exploiting the Linguistic Meta-Model structure for knowledge expansion. It is a modular system in which each module accomplishes a natural language processing task, from terminology extraction to knowledge retrieval. VIKI is a supporting system to Linguistic Meta-Model and its main task is to give some empirical evidence regarding the use of Linguistic Meta-Model without claiming to be thorough.

The InterPro protein families database: the classification resource after 15 years.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36 766 member database signatures integrated into 26 238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.

Structural variation-associated expression changes are paralleled by chromatin architecture modifications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Copy number variants (CNVs) influence the expression of genes that map not only within the rearrangement, but also to its flanks. To assess the possible mechanism(s) underlying this "neighboring effect", we compared intrachromosomal interactions and histone modifications in cell lines of patients affected by genomic disorders and control individuals. Using chromosome conformation capture (4C-seq), we observed that a set of genes flanking the Williams-Beuren Syndrome critical region (WBSCR) were often looping together. The newly identified interacting genes include AUTS2, mutations of which are associated with autism and intellectual disabilities. Deletion of the WBSCR disrupts the expression of this group of flanking genes, as well as long-range interactions between them and the rearranged interval. We also pinpointed concomitant changes in histone modifications between samples. We conclude that large genomic rearrangements can lead to chromatin conformation changes that extend far away from the structural variant, thereby possibly modulating expression globally and modifying the phenotype. GEO SERIES ACCESSION NUMBER: GSE33784, GSE33867.

Mapping of Environmental Data Using Kernel-Based Methods

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.

The Microbe browser for comparative genomics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Microbe browser is a web server providing comparative microbial genomics data. It offers comprehensive, integrated data from GenBank, RefSeq, UniProt, InterPro, Gene Ontology and the Orthologs Matrix Project (OMA) database, displayed along with gene predictions from five software packages. The Microbe browser is daily updated from the source databases and includes all completely sequenced bacterial and archaeal genomes. The data are displayed in an easy-to-use, interactive website based on Ensembl software. The Microbe browser is available at http://microbe.vital-it.ch/. Programmatic access is available through the OMA application programming interface (API) at http://microbe.vital-it.ch/api.

A genetic isolation gradient of populations of the Balearic green toad (Bufo balearicus) follows rising eastward fragmentation of the rural landscapes on the island of Menorca.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present the first approach to the genetic diversity and structure of the Balearic toad (Bufo balearicus Boettger, 1880) for the island of Menorca. Forty-one individ- uals from 21 localities were analyzed for ten microsatellite loci. We used geo-refer- enced individual multilocus genotypes and a model-based clustering method for the inference of the number of populations and of the spatial location of genetic dis- continuities between those populations.¦Only six of the microsatellites analyzed were polymorphic. We revealed a northwest- ern area inhabited by a single population with several well-connected localities and another set of populations in the southeast that includes a few unconnected small units with genetically significant differences among them as well as with the individ- uals from the northwest of the island. The observed fragmentation may be explained by shifts from agricultural to tourism practices that have been taking place on the island of Menorca since the 1960s. The abandonment of rural activities in favor of urbanization and concomitant service areas has mostly affected the southeast of the island and is currently threatening the overall geographic connectivity between the different farming areas of the island that are inhabited by the Balearic toad.

1st radiometric dating of a paleontologically dated Bathonian level from Georgia (USSR) - Use of the cathodoluminescence for selection of suitable plagioclases

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Jurassic volcanic formations interlayered with (ammonite-bearing) sediments are common in the Caucasus area; this situation is of interest for the numerical calibration of the poorly documented Jurassic portion of the time scale. However, following petrographic study on thin sections no whole-rocks can be considered reliable geochronometers due to subsequent alteration; from about 20 samples, two were selected for plagioclase dating; one (V134) is probably early Kimmeridgian in age; the other (V136) is probably located in the Lower Bathonian stage according to diagnostic ammonites. Cathodoluminescence (CTL) study has shown that sample V136 was similar to usual volcanic feldspars (blue to green colour); however, the lack of CTL of the V134 plagioclase is a character common to diagenetic feldspars; consequently, in spite of a good optical preservation, this geo-chronometer cannot give an age representative of the time of emplacement of the lava flow. We have combined CTL observation with microprobe analysis in order to document the poorly known CTL behaviour of volcanic feldspars; cations Ti4+ and Fe2+ play a major role in the CTL colour of plagioclases and are able to document the growing history of these feldspars ; phenocrysts are initially rich in Fe2+ (core of the crystals, green in colour), then richer in Ti toward the exterior; microcrysts are even richer in Ti (blue to bright blue). We have also observed that natural CTL colour was modified resulting from acid ``cleaning'' of the separated feldspars : the initial blue or green colour tends to change to yellow or violet, respectively, after acid treatment probably due to oxydation of Fe2+ toward Fe3+. X-ray and microprobe analyses both indicated that plagioclases from sample V134 was near the sodic end member (albite) suggesting a diagenetic origin in this andesitic basalt; In contrast, sample V136 contains a calcic plagioclase of common composition for a doleritic basalt. The K-Ar conventional technique was applied as a preliminary tool for radiometric analysis. The Kimmeridgian Na-plagioclase sample gave a ``rejuvenated'' (85 Ma) apparent age which confirms a late genesis for the separated plagioclase phase; this interpretation is based on CTL observation, X-ray analysis, and microprobe analysis ; these techniques are able to distinguish samples which have been submitted to diagenetic alteration from those which have not. An age consistent with the stratigraphic location has been obtained from sample V136. This age of 161 +/- 3 (2-sigma) Ma, is the first one available from a sample palaeontologically located with reasonable precision within the mid Jurassic time.

Integration of genome-wide association studies with biological knowledge identifies six novel genes related to kidney function.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10(-9)) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10(-4)-2.2 × 10(-7). Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.

PROTEOMICS OF OXIDATIVE LESIONS OCCURRING DURING TRANSFUSION-PURPOSED STORAGE OF ERYTHROCYTE CONCENTRATES

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Erythrocyte concentrates (ECs) are the major labile blood product being transfused worldwide, aiming at curing anemia of diverse origins. In Switzerland, ECs are stored at 4 °C up to 42 days in saline-adenine-glucose-mannitol (SAGM). Such storage induces cellular lesions, altering red blood cells (RBCs) metabolism, protein content and rheological properties. A hot debate exists regarding the impact of the storage lesions, thus the age of ECs on transfusion-related clinical adverse outcomes. Several studies tend to show that poorer outcomes occur in patients receiving older blood products. However, no clear association was demonstrated up to date. While metabolism and early rheological changes are reversible through transfusion of the blood units, oxidized proteins cannot be repaired, and it is likely such irreversible damages would affect the quality of the blood product and the efficiency of the transfusion. In vivo, RBCs are constantly exposed to oxygen fluxes, and are thus well equipped to deal with oxidative challenges. Moreover, functional 20S proteasome complexes allow for recognition and proteolysis of fairly oxidized protein, and some proteins can be eliminated from RBCs by the release of microvesicles. The present PhD thesis is involved in a global research project which goal is to characterize the effect of processing and storage on the quality of ECs. Assessing protein oxidative damages during RBC storage is of major importance to understand the mechanisms of aging of stored RBCs. To this purpose, redox proteomic-based investigations were conducted here. In a first part, cysteine oxidation and protein carbonylation were addressed via 2D-DIGE and derivatization-driven immunodetection approaches, respectively. Then, the oxidized sub- proteomes were characterized through LC-MS/MS identification of proteins in spots of interest (cysteine oxidation) or affinity-purified carbonylated proteins. Gene ontology annotation allowed classifying targets of oxidation according to their molecular functions. In a third part, the P20S activity was evaluated throughout the storage period of ECs, and its susceptibility to highly oxidized environment was investigated. The potential defensive role of microvesiculation was also addressed through the quantification of eliminated carbonylated proteins. We highlighted distinct protein groups differentially affected by cysteine oxidation, either reversibly or irreversibly. In addition, soluble extracts showed a decrease in carbonylation at the beginning of the storage and membrane extracts revealed increasing carbonylation after 4 weeks of storage. Engaged molecular functions revealed that antioxidant (AO) are rather reversibly oxidized at their cysteine residue(s), but are irreversibly oxidized through carbonylation. In the meantime, the 20S proteasome activity is decreased by around 40 % at the end of the storage period. Incubation of fresh RBCs extracts with exogenous oxidized proteins showed a dose-dependent and protein-dependent inhibitory effect. Finally, we proved that the release of microvesicles allows the elimination of increasing quantities of carbonylated proteins. Taken together, these results revealed an oxidative pathway model of RBCs storage, on which further investigation towards improved storage conditions will be based. -- Les concentrés érythrocytaires (CE) sont le produit sanguin le plus délivré au monde, permettant de traiter différentes formes d'anémies. En Suisse, les CE sont stocké à 4 °C pendant 42 jours dans une solution saline d'adénine, glucose et mannitol (SAGM). Une telle conservation induit des lésions de stockage qui altèrent le métabolisme, les protéines et les propriétés rhéologique du globule rouge (GR). Un débat important concerne l'impact du temps de stockage des CE sur les risques de réaction transfusionnelles, certaines études tentant de démontrer que des transfusions de sang vieux réduiraient l'espérance de vie des patients. Cependant, aucune association concrète n'a été prouvée à ce jour. Alors que les modifications du métabolisme et changement précoces des propriétés rhéologiques sont réversibles suite à la transfusion du CE, les protéines oxydées ne peuvent être réparées, et il est probable que de telles lésions affectent la qualité et l'efficacité des produits sanguins. In vivo, les GR sont constamment exposés à l'oxygène, et sont donc bien équipés pour résister aux lésions oxydatives. De plus, les complexes fonctionnels de proteasome 20S reconnaissent et dégradent les protéines modérément oxydées, et certaines protéines peuvent être éliminées par les microparticules. Cette thèse de doctorat est imbriquée dans un projet de recherche global ayant pour objectif la caractérisation des effets de la préparation et du stockage sur la qualité des GR. Evaluer les dommages oxydatifs du GR pendant le stockage est primordial pour comprendre les mécanismes de vieillissement des produits sanguin. Dans ce but, des recherches orientées redoxomique ont été conduites. Dans une première partie, l'oxydation des cystéines et la carbonylation des protéines sont évaluées par électrophorèse bidimensionnelle différentielle et par immunodétection de protéines dérivatisées. Ensuite, les protéines d'intérêt ainsi que les protéines carbonylées, purifiées par affinité, sont identifiées par spectrométrie de masse en tandem. Les protéines cibles de l'oxydation sont classées selon leur fonction moléculaire. Dans une troisième partie, l'activité protéolytique du protéasome 20S est suivie durant la période de stockage. L'impact du stress oxydant sur cette activité a été évalué en utilisant des protéines exogènes oxydées in vitro. Le potentiel rôle défensif de la microvesiculation a également été étudié par la quantification des protéines carbonylées éliminées. Dans ce travail, nous avons observé que différents groupes de protéines sont affectés par l'oxydation réversible ou irréversible de leurs cystéines. De plus, une diminution de la carbonylation en début de stockage dans les extraits solubles et une augmentation de la carbonylation après 4 semaines dans les extraits membranaires ont été montrées. Les fonctions moléculaires engagées par les protéines altérées montrent que les défenses antioxydantes sont oxydées de façon réversible sur leurs résidus cystéines, mais sont également irréversiblement carbonylées. Pendant ce temps, l'activité protéolytique du protéasome 20S décroit de 40 % en fin de stockage. L'incubation d'extraits de GR en début de stockage avec des protéines oxydées exogènes montre un effet inhibiteur « dose-dépendant » et « protéine-dépendant ». Enfin, les microvésicules s'avèrent éliminer des quantités croissantes de protéines carbonylées. La synthèse de ces résultats permet de modéliser une voie oxydative du stockage des GRs, à partir de laquelle de futures recherches seront menées avec pour but l'amélioration des conditions de stockage.

The UniProt-GO Annotation database in 2011.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360,000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.

BIO-INSPIRED COMPUTATIONAL TECHNIQUES APPLIED TO THE CLUSTERING AND VISUALIZATION OF SPATIO-TEMPORAL GEOSPATIAL DATA

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.

Animal Toxins: How is Complexity Represented in Databases?

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Peptide toxins synthesized by venomous animals have been extensively studied in the last decades. To be useful to the scientific community, this knowledge has been stored, annotated and made easy to retrieve by several databases. The aim of this article is to present what type of information users can access from each database. ArachnoServer and ConoServer focus on spider toxins and cone snail toxins, respectively. UniProtKB, a generalist protein knowledgebase, has an animal toxin-dedicated annotation program that includes toxins from all venomous animals. Finally, the ATDB metadatabase compiles data and annotations from other databases and provides toxin ontology.

Fourmidable: a database for ant genomics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Fourmidable is an infrastructure to curate and share the emerging genetic, molecular, and functional genomic data and protocols for ants. DESCRIPTION: The Fourmidable assembly pipeline groups nucleotide sequences into clusters before independently assembling each cluster. Subsequently, assembled sequences are annotated via Interproscan and BLAST against general and insect-specific databases. Gene-specific information can be retrieved using gene identifiers, searching for similar sequences or browsing through inferred Gene Ontology annotations. The database will readily scale as ultra-high throughput sequence data and sequences from additional species become available. CONCLUSION: Fourmidable currently houses EST data from two ant species and microarray gene expression data for one of these. Fourmidable is publicly available at http://fourmidable.unil.ch.

Potential influence of the chemical composition of water on the stable oxygen isotope composition of continental ostracods

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many studies in continental areas have successfully used the oxygen isotope composition of fossil ostracod valves to reconstruct past hydrological conditions associated with large changes in climate. Yet, ostracods are known to crystallise their valves out of isotopic equilibrium for oxygen and they generally have higher 18O contents compared to inorganic calcite grown at equilibrium under the same condi- tions. A review of vital offsets determined for continental ostracods indicates that vital offsets might change from site to site, questioning a potential influence of environmental conditions on oxygen isotope fractionation in ostracods. Results from the literature suggest that pH has no influence on ostracod vital offset. A re-evaluation of results from Li and Liu (J Paleolimnol 43:111-120, 2010) suggests that salin- ity may influence oxygen isotope fractionation in ostracods, with lower vital offsets for higher salinities. Such a relationship was also observed for the vital offsets determined by Chivas et al. (The ostracoda- applications in quaternary research. American Geo- physical Union, Washington, DC, 2002). Yet, when results of all studies are compiled, the correlation between vital offsets and salinity is low while the correlation between vital offsets and host water Mg/Ca is higher, suggesting that ionic composition of water and/or relative abundance of major ions may also control oxygen isotope fractionation in ostracods. Lack of data on host water ionic composition for the different studies precludes more detailed examination at this stage. Further studies such as natural or laboratory cultures done under strictly controlled conditions are needed to better understand the potential influence of varying environmental condi- tions on oxygen isotope compositions of ostracod valves.

Analysis of gene expression patterns in animals

Relevância:

10.00% 10.00%

Publicador:

Resumo:

During my PhD, my aim was to provide new tools to increase our capacity to analyse gene expression patterns, and to study on a large-scale basis the evolution of gene expression in animals. Gene expression patterns (when and where a gene is expressed) are a key feature in understanding gene function, notably in development. It appears clear now that the evolution of developmental processes and of phenotypes is shaped both by evolution at the coding sequence level, and at the gene expression level.Studying gene expression evolution in animals, with complex expression patterns over tissues and developmental time, is still challenging. No tools are available to routinely compare expression patterns between different species, with precision, and on a large-scale basis. Studies on gene expression evolution are therefore performed only on small genes datasets, or using imprecise descriptions of expression patterns.The aim of my PhD was thus to develop and use novel bioinformatics resources, to study the evolution of gene expression. To this end, I developed the database Bgee (Base for Gene Expression Evolution). The approach of Bgee is to transform heterogeneous expression data (ESTs, microarrays, and in-situ hybridizations) into present/absent calls, and to annotate them to standard representations of anatomy and development of different species (anatomical ontologies). An extensive mapping between anatomies of species is then developed based on hypothesis of homology. These precise annotations to anatomies, and this extensive mapping between species, are the major assets of Bgee, and have required the involvement of many co-workers over the years. My main personal contribution is the development and the management of both the Bgee database and the web-application.Bgee is now on its ninth release, and includes an important gene expression dataset for 5 species (human, mouse, drosophila, zebrafish, Xenopus), with the most data from mouse, human and zebrafish. Using these three species, I have conducted an analysis of gene expression evolution after duplication in vertebrates.Gene duplication is thought to be a major source of novelty in evolution, and to participate to speciation. It has been suggested that the evolution of gene expression patterns might participate in the retention of duplicate genes. I performed a large-scale comparison of expression patterns of hundreds of duplicated genes to their singleton ortholog in an outgroup, including both small and large-scale duplicates, in three vertebrate species (human, mouse and zebrafish), and using highly accurate descriptions of expression patterns. My results showed unexpectedly high rates of de novo acquisition of expression domains after duplication (neofunctionalization), at least as high or higher than rates of partitioning of expression domains (subfunctionalization). I found differences in the evolution of expression of small- and large-scale duplicates, with small-scale duplicates more prone to neofunctionalization. Duplicates with neofunctionalization seemed to evolve under more relaxed selective pressure on the coding sequence. Finally, even with abundant and precise expression data, the majority fate I recovered was neither neo- nor subfunctionalization of expression domains, suggesting a major role for other mechanisms in duplicate gene retention.

«
1
2
3
4
5
6
»