29 resultados para orthology
Resumo:
With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.
Resumo:
The Linked Data initiative offers a straight method to publish structured data in the World Wide Web and link it to other data, resulting in a world wide network of semantically codified data known as the Linked Open Data cloud. The size of the Linked Open Data cloud, i.e. the amount of data published using Linked Data principles, is growing exponentially, including life sciences data. However, key information for biological research is still missing in the Linked Open Data cloud. For example, the relation between orthologs genes and genetic diseases is absent, even though such information can be used for hypothesis generation regarding human diseases. The OGOLOD system, an extension of the OGO Knowledge Base, publishes orthologs/diseases information using Linked Data. This gives the scientists the ability to query the structured information in connection with other Linked Data and to discover new information related to orthologs and human diseases in the cloud.
Resumo:
Background: Hexamerins are hemocyanin-derived proteins that have lost the ability to bind copper ions and transport oxygen; instead, they became storage proteins. The current study aimed to broaden our knowledge on the hexamerin genes found in the honey bee genome by exploring their structural characteristics, expression profiles, evolution, and functions in the life cycle of workers, drones and queens. Results: The hexamerin genes of the honey bee (hex 70a, hex 70b, hex 70c and hex 110) diverge considerably in structure, so that the overall amino acid identity shared among their deduced protein subunits varies from 30 to 42%. Bioinformatics search for motifs in the respective upstream control regions (UCRs) revealed six overrepresented motifs including a potential binding site for Ultraspiracle (Usp), a target of juvenile hormone (JH). The expression of these genes was induced by topical application of JH on worker larvae. The four genes are highly transcribed by the larval fat body, although with significant differences in transcript levels, but only hex 110 and hex 70a are re-induced in the adult fat body in a caste-and sex-specific fashion, workers showing the highest expression. Transcripts for hex 110, hex 70a and hex70b were detected in developing ovaries and testes, and hex 110 was highly transcribed in the ovaries of egg-laying queens. A phylogenetic analysis revealed that HEX 110 is located at the most basal position among the holometabola hexamerins, and like HEX 70a and HEX 70c, it shares potential orthology relationship with hexamerins from other hymenopteran species. Conclusions: Striking differences were found in the structure and developmental expression of the four hexamerin genes in the honey bee. The presence of a potential binding site for Usp in the respective 5' UCRs, and the results of experiments on JH level manipulation in vivo support the hypothesis of regulation by JH. Transcript levels and patterns in the fat body and gonads suggest that, in addition to their primary role in supplying amino acids for metamorphosis, hexamerins serve as storage proteins for gonad development, egg production, and to support foraging activity. A phylogenetic analysis including the four deduced hexamerins and related proteins revealed a complex pattern of evolution, with independent radiation in insect orders.
Resumo:
Ten microsatellite loci are described in Araucaria cunninghamii, the first reported in the Araucariaceae. Eight were tested in sections Eutacta and Bunya, which diverged more than 200 MYA, and to the sister genus Agathis. Specific amplification products within the expected size range were obtained for six to eight loci in section Eutacta (depending on species), five loci in section Bunya and three. loci in Agathis. Two of the loci (CRCAc1 and CRCAc2, both GA repeats) produced specific amplification products in all taxa, with orthology confirmed by sequence analysis. The repeats were perfect in all taxa. The flanking sequences were extremely conserved, with sequence divergence of 0% to 2.0% within Araucaria species and 2.9% to 7.5% between Araucaria and Agathis. These microsatellites represent some of the most conserved microsatellite loci reported in plants. This may be due to a low evolutionary rate in Araucariaceae genome or the loci may be closely associated with highly conserved, unreported genes.
Resumo:
One of the main motivations to study amphioxus is its potential for understanding the last common ancestor of chordates, which notably gave rise to the vertebrates. An important feature in this respect is the slow evolutionary rate that seems to have characterized the cephalochordate lineage, making amphioxus an interesting proxy for the chordate ancestor, as well as a key lineage to include in comparative studies. Whereas slow evolution was first noticed at the phenotypic level, it has also been described at the genomic level. Here, we examine whether the amphioxus genome is indeed a good proxy for the genome of the chordate ancestor, with a focus on protein-coding genes. We investigate genome features, such as synteny, gene duplication and gene loss, and contrast the amphioxus genome with those of other deuterostomes that are used in comparative studies, such as Ciona, Oikopleura and urchin.
Resumo:
Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. AVAILABILITY AND IMPLEMENTATION: All such materials are available at http://questfororthologs.org. CONTACT: erik.sonnhammer@scilifelab.se or c.dessimoz@ucl.ac.uk.
Resumo:
SummaryGene duplication and neofunctidnalization are important processes in the evolution of phenotypic complexity. They account for important evolutionary novelties that confer ecological adaptation, such as the major histocompatibility complex (MHC), a multigene family with a central role in vertebrates' adaptive immune system. Multigene families, which evolved in large part through duplication, represent promising systems to study the still strongly depbated relative roles of neutral and adaptive processes in the evolution of phenotypic complexity. Detailed knowledge on ecological function and a well-characterized evolutionary history place the mammals' MHC amongst ideal study systems. However mammalian MHCs usually encompass several million base pairs and hold a large number of functional and non-functional duplicate genes, which makes their study complex. Avian MHCs on the other hand are usually way more compact, but the reconstruction of. their evolutionary history has proven notoriously difficult. However, no focused attempt has been undertaken so far to study the avian MHC evolutionary history in a broad phylogenetic context and using adequate gene regions.In the present PhD, we were able to make important contributions to the understanding of the long-term evolution of the avian MHC class II Β (MHCI1B). First, we isolated and characterized MHCIIB genes in barn owl (Tyto alba?, Strigiformes, Tytonidae), a species from an avian lineage in which MHC has not been studied so far. Our results revealed that with only two functional MHCIIB genes the MHC organization of barn owl may be similar to the 'minimal essential' MHC of chicken (Gallus gallus), indicating that simple MHC organization may be ancestral to birds. Taking advantage of the sequence information from barn owl, we studied the evolution of MHCIIB genes in 13 additional species of 'typical' owls (Strigiformes, Strigidae). Phylogenetic analyses revealed that according to their function, in owls the peptide-binding region (PBR) encoding exon 2 and the non-PBR encoding exon 3 evolve by different patterns. Exon 2 exhibited an evolutionary history of positive selection and recombination, while exon 3 traced duplication history and revealed two paralogs evolving divergently from each other in owls, and in a shorebird, the great snipe {Gallinago media). The results from exon 3 were the first ever from birds to demonstrate gene orthology in species that diverged tens of millions of years ago, and strongly questioned whether the taxa studied before provided an adequate picture of avian MHC evolution. In a follow-up study, we aimed at explaining a striking pattern revealed by phylogenetic trees analyzing the owl sequences along with MHCIIB sequences from other birds: One owl paralog (termed DAB1) grouped with sequences of passerines and falcons, while the other (DAB2) grouped with wildfowl, penguins and birds of prey. This could be explained by either a duplication event preceding the evolution of these bird orders, or by convergent evolution of similar sequences in a number of orders. With extensive phylogenetic analyses we were able to show, that indeed a duplication event preceeded the major avian radiation -100 my ago, and that following this duplication, the paralogs evolved under positive selection. Furthermore, we showed that the divergently evolving amino acid residues in the MHCIIB-encoded β-chain potentially interact with the MHCI I α-chain, and that molecular coevolution of the interacting residues may have been involved in the divergent evolution of the MHCIIB paralogs.The findings of this PhD are of particular interest to the understanding of the evolutionary history of the avian MHC and, by providing essential information on long-term gene history in the avian MHC, open promising perspectives for advances in the understanding of the evolution of multigene families in general, and for avian MHC organization in particular. Amongst others I discuss the importance of including protein structure in the phylogenetic study of multigene families, and the roles of ecological versus molecular selection pressures. I conclude by providing a population genomic perspective on avian MHC, which may serve as a basis for future research to investigate the relative roles of neutral processes involving effective population size effects and of adaptation in the evolution of avian MHC diversity and organization.RésuméLa duplication de gènes et leur néo-fonctionnalisation sont des processus importants dans l'évolution de la complexité phénotypique. Ils sont impliqués dans l'apparition d'importantes nouveautés évolutives favorisant l'adaptation écologique, comme c'est le cas pour le complexe majeur d'histocompatibilité
Resumo:
Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference trees. For three well-conserved protein families, we observed a generally high specificity of orthology assignments for these databases. We show that differences in the completeness of predicted gene relationships and in the phylogenetic information are, for the great majority, not due to the methods used, but to differences in the underlying database concepts. According to our metrics, none of the databases provides a fully correct and comprehensive protein classification. Our results provide a framework for meaningful and systematic comparisons of phylogenomic databases. In the future, a sustainable set of 'Gold standard' phylogenetic trees could provide a robust method for phylogenomic databases to assess their current quality status, measure changes following new database releases and diagnose improvements subsequent to an upgrade of the analysis procedure.
Resumo:
Background: The variety of DNA microarray formats and datasets presently available offers an unprecedented opportunity to perform insightful comparisons of heterogeneous data. Cross-species studies, in particular, have the power of identifying conserved, functionally important molecular processes. Validation of discoveries can now often be performed in readily available public data which frequently requires cross-platform studies.Cross-platform and cross-species analyses require matching probes on different microarray formats. This can be achieved using the information in microarray annotations and additional molecular biology databases, such as orthology databases. Although annotations and other biological information are stored using modern database models ( e. g. relational), they are very often distributed and shared as tables in text files, i.e. flat file databases. This common flat database format thus provides a simple and robust solution to flexibly integrate various sources of information and a basis for the combined analysis of heterogeneous gene expression profiles.Results: We provide annotationTools, a Bioconductor-compliant R package to annotate microarray experiments and integrate heterogeneous gene expression profiles using annotation and other molecular biology information available as flat file databases. First, annotationTools contains a specialized set of functions for mining this widely used database format in a systematic manner. It thus offers a straightforward solution for annotating microarray experiments. Second, building on these basic functions and relying on the combination of information from several databases, it provides tools to easily perform cross-species analyses of gene expression data.Here, we present two example applications of annotationTools that are of direct relevance for the analysis of heterogeneous gene expression profiles, namely a cross-platform mapping of probes and a cross-species mapping of orthologous probes using different orthology databases. We also show how to perform an explorative comparison of disease-related transcriptional changes in human patients and in a genetic mouse model.Conclusion: The R package annotationTools provides a simple solution to handle microarray annotation and orthology tables, as well as other flat molecular biology databases. Thereby, it allows easy integration and analysis of heterogeneous microarray experiments across different technological platforms or species.
Resumo:
Owing to its special mode of evolution and central role in the adaptive immune system, the major histocompatibility complex (MHC) has become the focus of diverse disciplines such as immunology, evolutionary ecology, and molecular evolution. MHC evolution has been studied extensively in diverse vertebrate lineages over the last few decades, and it has been suggested that birds differ from the established mammalian norm. Mammalian MHC genes evolve independently, and duplication history (i.e., orthology) can usually be traced back within lineages. In birds, this has been observed in only 3 pairs of closely related species. Here we report strong evidence for the persistence of orthology of MHC genes throughout an entire avian order. Phylogenetic reconstructions of MHC class II B genes in 14 species of owls trace back orthology over tens of thousands of years in exon 3. Moreover, exon 2 sequences from several species show closer relationships than sequences within species, resembling transspecies evolution typically observed in mammals. Thus, although previous studies suggested that long-term evolutionary dynamics of the avian MHC was characterized by high rates of concerted evolution, resulting in rapid masking of orthology, our results question the generality of this conclusion. The owl MHC thus opens new perspectives for a more comprehensive understanding of avian MHC evolution.
Resumo:
Abstract Background: Many complex systems can be represented and analysed as networks. The recent availability of large-scale datasets, has made it possible to elucidate some of the organisational principles and rules that govern their function, robustness and evolution. However, one of the main limitations in using protein-protein interactions for function prediction is the availability of interaction data, especially for Mollicutes. If we could harness predicted interactions, such as those from a Protein-Protein Association Networks (PPAN), combining several protein-protein network function-inference methods with semantic similarity calculations, the use of protein-protein interactions for functional inference in this species would become more potentially useful. Results: In this work we show that using PPAN data combined with other approximations, such as functional module detection, orthology exploitation methods and Gene Ontology (GO)-based information measures helps to predict protein function in Mycoplasma genitalium. Conclusions: To our knowledge, the proposed method is the first that combines functional module detection among species, exploiting an orthology procedure and using information theory-based GO semantic similarity in PPAN of the Mycoplasma species. The results of an evaluation show a higher recall than previously reported methods that focused on only one organism network.
Resumo:
Phylogenetic trees representing the evolutionary relationships of homologous genes are the entry point for many evolutionary analyses. For instance, the use of a phylogenetic tree can aid in the inference of orthology and paralogy relationships, and in the detection of relevant evolutionary events such as gene family expansions and contractions, horizontal gene transfer, recombination or incomplete lineage sorting. Similarly, given the plurality of evolutionary histories among genes encoded in a given genome, there is a need for the combined analysis of genome-wide collections of phylogenetic trees (phylomes). Here, we introduce a new release of PhylomeDB (http://phylomedb.org), a public repository of phylomes. Currently, PhylomeDB hosts 120 public phylomes, comprising >1.5 million maximum likelihood trees and multiple sequence alignments. In the current release, phylogenetic trees are annotated with taxonomic, protein-domain arrangement, functional and evolutionary information. PhylomeDB is also a major source for phylogeny-based predictions of orthology and paralogy, covering >10 million proteins across 1059 sequenced species. Here we describe newly implemented PhylomeDB features, and discuss a benchmark of the orthology predictions provided by the database, the impact of proteome updates and the use of the phylome approach in the analysis of newly sequenced genomes and transcriptomes.
Resumo:
Quest for Orthologs (QfO) is a community effort with the goal to improve and benchmark orthology predictions. As quality assessment assumes prior knowledge on species phylogenies, we investigated the congruency between existing species trees by comparing the relationships of 147 QfO reference organisms from six Tree of Life (ToL)/species tree projects: The National Center for Biotechnology Information (NCBI) taxonomy, Opentree of Life, the sequenced species/species ToL, the 16S ribosomal RNA (rRNA) database, and trees published by Ciccarelli et al. (Ciccarelli FD, et al. 2006. Toward automatic reconstruction of a highly resolved tree of life. Science 311:1283-1287) and by Huerta-Cepas et al. (Huerta-Cepas J, Marcet-Houben M, Gabaldon T. 2014. A nested phylogenetic reconstruction approach provides scalable resolution in the eukaryotic Tree Of Life. PeerJ PrePrints 2:223) Our study reveals that each species tree suggests a different phylogeny: 87 of the 146 (60%) possible splits of a dichotomous and rooted tree are congruent, while all other splits are incongruent in at least one of the species trees. Topological differences are observed not only at deep speciation events, but also within younger clades, such as Hominidae, Rodentia, Laurasiatheria, or rosids. The evolutionary relationships of 27 archaea and bacteria are highly inconsistent. By assessing 458,108 gene trees from 65 genomes, we show that consistent species topologies are more often supported by gene phylogenies than contradicting ones. The largest concordant species tree includes 77 of the QfO reference organisms at the most. Results are summarized in the form of a consensus ToL (http://swisstree.vital-it.ch/species_tree) that can serve different benchmarking purposes.
Resumo:
MOTIVATION: The functional impact of small molecules is increasingly being assessed in different eukaryotic species through large-scale phenotypic screening initiatives. Identifying the targets of these molecules is crucial to mechanistically understand their function and uncover new therapeutically relevant modes of action. However, despite extensive work carried out in model organisms and human, it is still unclear to what extent one can use information obtained in one species to make predictions in other species. RESULTS: Here, for the first time, we explore and validate at a large scale the use of protein homology relationships to predict the targets of small molecules across different species. Our results show that exploiting target homology can significantly improve the predictions, especially for molecules experimentally tested in other species. Interestingly, when considering separately orthology and paralogy relationships, we observe that mapping small molecule interactions among orthologs improves prediction accuracy, while including paralogs does not improve and even sometimes worsens the prediction accuracy. Overall, our results provide a novel approach to integrate chemical screening results across multiple species and highlight the promises and remaining challenges of using protein homology for small molecule target identification. AVAILABILITY AND IMPLEMENTATION: Homology-based predictions can be tested on our website http://www.swisstargetprediction.ch. CONTACT: david.gfeller@unil.ch or vincent.zoete@isb-sib.ch. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Resumo:
Au cours du développement des végétaux, de l’établissement de l’identité cellulaire des premiers organes au guidage du tube pollinique, la communication cellule à cellule est d’une importance capitale. En réponse, les voies de signalisation moléculaires sont élaborées pour la perception d’un signal extérieur et la transduction en une réponse génique via une cascade intracellulaire. Les récepteurs kinases font partie des protéines perceptrices des stimuli et constituent chez les plantes une catégorie de protéines avec une occurrence considérable, mais dont très peu d’informations détaillées sont disponibles à ce jour. Une famille de récepteurs kinases chez Arabidopsis thaliana, AtORK11 (Arabidopsis thaliana Ovule Receptor Kinase 11), a été identifiée par orthologie à un récepteur spécifique aux ovaires chez une solanacéee sauvage, Solanum chacoense. La fonction présumée de cette famille de récepteurs kinases de type leucine-rich repeat, suggérée par son patron d’expression, implique les événements relatifs au développement des gamétophytes et à la reproduction. Afin de caractériser la fonction des quatre gènes de la famille (AtORK11a, AtORK11b, AtORK11c et AtORK11d) une stratégie d’analyse de mutants d’insertion de l’ADN-T et d’évaluation du mode d’action par complémentation bimoléculaire par fluorescence (BiFC) a été entreprise. Aucune fonction précise n’a pu être attribuée aux doubles mutants d’insertion, par contre la surexpression d’une construction dominante négative indique un rôle dans le développement gamétophytique. Il a aussi été démontré que les quatre récepteurs peuvent interagir par homodimérisation aussi bien que par hétérodimérisation. Une hypothèse de redondance fonctionnelle est ainsi mise à jour parmi la famille des gènes AtORK11.