18 resultados para protein evolution

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few aminoacids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution. We have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance. We have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Proteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve. To gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains. We conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In Drosophila, the insulin-signaling pathway controls some life history traits, such as fertility and lifespan, and it is considered to be the main metabolic pathway involved in establishing adult body size. Several observations concerning variation in body size in the Drosophila genus are suggestive of its adaptive character. Genes encoding proteins in this pathway are, therefore, good candidates to have experienced adaptive changes and to reveal the footprint of positive selection. The Drosophila insulin-like peptides (DILPs) are the ligands that trigger the insulin-signaling cascade. In Drosophila melanogaster, there are several peptides that are structurally similar to the single mammalian insulin peptide. The footprint of recent adaptive changes on nucleotide variation can be unveiled through the analysis of polymorphism and divergence. With this aim, we have surveyed nucleotide sequence variation at the dilp1-7 genes in a natural population of D. melanogaster. The comparison of polymorphism in D. melanogaster and divergence from D. simulans at different functional classes of the dilp genes provided no evidence of adaptive protein evolution after the split of the D. melanogaster and D. simulans lineages. However, our survey of polymorphism at the dilp gene regions of D. melanogaster has provided some evidence for the action of positive selection at or near these genes. The regions encompassing the dilp1-4 genes and the dilp6 gene stand out as likely affected by recent adaptive events.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: Chemoreception is a widespread mechanism that is involved in critical biologic processes, including individual and social behavior. The insect peripheral olfactory system comprises three major multigene families: the olfactory receptor (Or), the gustatory receptor (Gr), and the odorant-binding protein (OBP) families. Members of the latter family establish the first contact with the odorants, and thus constitute the first step in the chemosensory transduction pathway.Results: Comparative analysis of the OBP family in 12 Drosophila genomes allowed the identification of 595 genes that encode putative functional and nonfunctional members in extant species, with 43 gene gains and 28 gene losses (15 deletions and 13 pseudogenization events). The evolution of this family shows tandem gene duplication events, progressive divergence in DNA and amino acid sequence, and prevalence of pseudogenization events in external branches of the phylogenetic tree. We observed that the OBP arrangement in clusters is maintained across the Drosophila species and that purifying selection governs the evolution of the family; nevertheless, OBP genes differ in their functional constraints levels. Finally, we detect that the OBP repertoire evolves more rapidly in the specialist lineages of the Drosophila melanogaster group (D. sechellia and D. erecta) than in their closest generalists.Conclusion: Overall, the evolution of the OBP multigene family is consistent with the birth-and-death model. We also found that members of this family exhibit different functional constraints, which is indicative of some functional divergence, and that they might be involved in some of the specialization processes that occurred through the diversification of the Drosophila genus.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Assessing the contribution of promoters and coding sequences to gene evolution is an important step toward discovering the major genetic determinants of human evolution. Many specific examples have revealed the evolutionary importance of cis-regulatory regions. However, the relative contribution of regulatory and coding regions to the evolutionary process and whether systemic factors differentially influence their evolution remains unclear. To address these questions, we carried out an analysis at the genome scale to identify signatures of positive selection in human proximal promoters. Next, we examined whether genes with positively selected promoters (Prom+ genes) show systemic differences with respect to a set of genes with positively selected protein-coding regions (Cod+ genes). We found that the number of genes in each set was not significantly different (8.1% and 8.5%, respectively). Furthermore, a functional analysis showed that, in both cases, positive selection affects almost all biological processes and only a few genes of each group are located in enriched categories, indicating that promoters and coding regions are not evolutionarily specialized with respect to gene function. On the other hand, we show that the topology of the human protein network has a different influence on the molecular evolution of proximal promoters and coding regions. Notably, Prom+ genes have an unexpectedly high centrality when compared with a reference distribution (P = 0.008, for Eigenvalue centrality). Moreover, the frequency of Prom+ genes increases from the periphery to the center of the protein network (P = 0.02, for the logistic regression coefficient). This means that gene centrality does not constrain the evolution of proximal promoters, unlike the case with coding regions, and further indicates that the evolution of proximal promoters is more efficient in the center of the protein network than in the periphery. These results show that proximal promoters have had a systemic contribution to human evolution by increasing the participation of central genes in the evolutionary process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are “genomic fossils” valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome’s structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The RPS4 gene codifies for ribosomal protein S4, a very well-conserved protein present in all kingdoms. In primates, RPS4 is codified by two functional genes located on both sex chromosomes: the RPS4X and RPS4Y genes. In humans, RPS4Y is duplicated and the Y chromosome therefore carries a third functional paralog: RPS4Y2, which presents a testis-specific expression pattern. Results: DNA sequence analysis of the intronic and cDNA regions of RPS4Y genes from species covering the entire primate phylogeny showed that the duplication event leading to the second Y-linked copy occurred after the divergence of New World monkeys, about 35 million years ago. Maximum likelihood analyses of the synonymous and non-synonymous substitutions revealed that positive selection was acting on RPS4Y2 gene in the human lineage, which represents the first evidence of positive selection on a ribosomal protein gene. Putative positive amino acid replacements affected the three domains of the protein: one of these changes is located in the KOW protein domain and affects the unique invariable position of this motif, and might thus have a dramatic effect on the protein function.Conclusion: Here, we shed new light on the evolutionary history of RPS4Y gene family, especially on that of RPS4Y2. The results point that the RPS4Y1 gene might be maintained to compensate gene dosage between sexes, while RPS4Y2 might have acquired a new function, at least in the lineage leading to humans.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Chemoreception is a biological process essential for the survival of animals, as it allows the recognition of important volatile cues for the detection of food, egg-laying substrates, mates or predators, among other purposes. Furthermore, its role in pheromone detection may contribute to evolutionary processes such as reproductive isolation and speciation. This key role in several vital biological processes makes chemoreception a particularly interesting system for studying the role of natural selection in molecular adaptation. Two major gene families are involved in the perireceptor events of the chemosensory system: the odorant-binding protein (OBP) and chemosensory protein (CSP) families. Here, we have conducted an exhaustive comparative genomic analysis of these gene families in twenty Arthropoda species. We show that the evolution of the OBP and CSP gene families is highly dynamic, with a high number of gains and losses of genes, pseudogenes and independent origins of subfamilies. Taken together, our data clearly support the birth-and-death model for the evolution of these gene families with an overall high gene-turnover rate. Moreover, we show that the genome organization of the two families is significantly more clustered than expected by chance and, more important, that this pattern appears to be actively maintained across the Drosophila phylogeny. Finally, we suggest the homologous nature of the OBP and CSP gene families, dating back their MRCA (most recent common ancestor) to 380¿420 Mya, and we propose a scenario for the origin and diversification of these families.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: It has been shown in a variety of organisms, including mammals, that genes that appeared recently in evolution, for example orphan genes, evolve faster than older genes. Low functional constraints at the time of origin of novel genes may explain these results. However, this observation has been recently attributed to an artifact caused by the inability of Blast to detect the fastest genes in different eukaryotic genomes. Distinguishing between these two possible explanations would be of great importance for any studies dealing with the taxon distribution of proteins and the origin of novel genes. Results: Here we used simulations of protein sequences to examine the capacity of Blast to detect proteins of diverse evolutionary rates in the different species of an eukaryotic phylogenetic tree that included metazoans, fungi and plants. We simulated the evolution of protein genes with the same evolutionary rates than those observed in functional mammalian genes and with among-site rate heterogeneity. Under these conditions, we found that only a very small percentage of simulated ancestral eukaryotic proteins was affected by the Blast artifact. We show that the good detectability of Blast is due to the heterogeneity of protein evolutionary rates at different sites, since only a small conserved motif in a sequence suffices to detect its homologues. Our results indicate that Blast, at least when applied within eukaryotes, only misses homologues of extremely fast-evolving sequences, which are rare in the mammalian genome, as well as sequences evolving homogeneously or pseudogenes.Conclusion: Although great care should be exercised in the recognition of remote homologues, most functional mammalian genes can be detected in eukaryotic genomes by Blast. That is, the majority of functional mammalian genes are not as fast as for not being detected in other metazoans, fungi or plants, if they had been present in these organisms. Thus, the correlation previously found between age and rate seems not to be due to a pure Blast artifact, at least for mammals. This may have important implications to understand the mechanisms by which novel genes originate.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background: Many complex systems can be represented and analysed as networks. The recent availability of large-scale datasets, has made it possible to elucidate some of the organisational principles and rules that govern their function, robustness and evolution. However, one of the main limitations in using protein-protein interactions for function prediction is the availability of interaction data, especially for Mollicutes. If we could harness predicted interactions, such as those from a Protein-Protein Association Networks (PPAN), combining several protein-protein network function-inference methods with semantic similarity calculations, the use of protein-protein interactions for functional inference in this species would become more potentially useful. Results: In this work we show that using PPAN data combined with other approximations, such as functional module detection, orthology exploitation methods and Gene Ontology (GO)-based information measures helps to predict protein function in Mycoplasma genitalium. Conclusions: To our knowledge, the proposed method is the first that combines functional module detection among species, exploiting an orthology procedure and using information theory-based GO semantic similarity in PPAN of the Mycoplasma species. The results of an evaluation show a higher recall than previously reported methods that focused on only one organism network.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fungi are a large group of eukaryotes found in nearly all ecosystems. More than 250 fungal genomes have already been sequenced, greatly improving our understanding of fungal evolution, physiology, and development. However, for the Pezizomycetes, an early-diverging lineage of filamentous ascomycetes, there is so far only one genome available, namely that of the black truffle, Tuber melanosporum, a mycorrhizal species with unusual subterranean fruiting bodies. To help close the sequence gap among basal filamentous ascomycetes, and to allow conclusions about the evolution of fungal development, we sequenced the genome and assayed transcriptomes during development of Pyronema confluens, a saprobic Pezizomycete with a typical apothecium as fruiting body. With a size of 50 Mb and ~13,400 protein-coding genes, the genome is more characteristic of higher filamentous ascomycetes than the large, repeat-rich truffle genome; however, some typical features are different in the P. confluens lineage, e.g. the genomic environment of the mating type genes that is conserved in higher filamentous ascomycetes, but only partly conserved in P. confluens. On the other hand, P. confluens has a full complement of fungal photoreceptors, and expression studies indicate that light perception might be similar to distantly related ascomycetes and, thus, represent a basic feature of filamentous ascomycetes. Analysis of spliced RNA-seq sequence reads allowed the detection of natural antisense transcripts for 281 genes. The P. confluens genome contains an unusually high number of predicted orphan genes, many of which are upregulated during sexual development, consistent with the idea of rapid evolution of sex-associated genes. Comparative transcriptomics identified the transcription factor gene pro44 that is upregulated during development in P. confluens and the Sordariomycete Sordaria macrospora. The P. confluens pro44 gene (PCON_06721) was used to complement the S. macrospora pro44 deletion mutant, showing functional conservation of this developmental regulator.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Protein domains represent the basic units in the evolution of proteins. Domain duplication and shuffling by recombination and fusion, followed by divergence are the most common mechanisms in this process. Such domain fusion and recombination events are predicted to occur only once for a given multidomain architecture. However, other scenarios may be relevant in the evolution of specific proteins, such as convergent evolution of multidomain architectures. With this in mind, we study glutaredoxin (GRX) domains, because these domains of approximately one hundred amino acids are widespread in archaea, bacteria and eukaryotes and participate in fusion proteins. GRXs are responsible for the reduction of protein disulfides or glutathione-protein mixed disulfides and are involved in cellular redox regulation, although their specific roles and targets are often unclear. Results: In this work we analyze the distribution and evolution of GRX proteins in archaea,bacteria and eukaryotes. We study over one thousand GRX proteins, each containing at least one GRX domain, from hundreds of different organisms and trace the origin and evolution of the GRX domain within the tree of life. Conclusion: Our results suggest that single domain GRX proteins of the CGFS and CPYC classes have, each, evolved through duplication and divergence from one initial gene that was present in the last common ancestor of all organisms. Remarkably, we identify a case of convergent evolution in domain architecture that involves the GRX domain. Two independent recombination events of a TRX domain to a GRX domain are likely to have occurred, which is an exception to the dominant mechanism of domain architecture evolution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A number of bacterial species, mostly proteobacteria, possess monothiol glutaredoxins homologous to the Saccharomyces cerevisiae mitochondrial protein Grx5, which is involved in iron–sulphur cluster synthesis. Phylogenetic profiling is used to predict that bacterial monothiol glutaredoxins also participate in the iron–sulphur cluster (ISC) assembly machinery, because their phylogenetic profiles are similar to the profiles of the bacterial homologues of yeast ISC proteins. High evolutionary cooccurrence is observed between the Grx5 homologues and the homologues of the Yah1 ferredoxin, the scaffold proteins Isa1 and Isa2, the frataxin protein Yfh1 and the Nfu1 protein. This suggests that a specific functional interaction exists between these ISC machinery proteins. Physical interaction analyses using low-definition protein docking predict the formation of strong and specific complexes between Grx5 and several components of the yeast ISC machinery. Two-hybrid analysis has confirmed the in vivo interaction between Grx5 and Isa1. Sequence comparison techniques and cladistics indicate that the other two monothiol glutaredoxins of S. cerevisiae, Grx3 and Grx4, have evolved from the fusion of a thioredoxin gene with a monothiol glutaredoxin gene early in the eukaryotic lineage, leading to differential functional specialization. While bacteria do not contain these chimaeric glutaredoxins, in many eukaryotic species Grx5 and Grx3/4-type monothiol glutaredoxins coexist in the cell.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Bacterial populations are highly successful at colonizing new habitats and adapting to changing environmental conditions, partly due to their capacity to evolve novel virulence and metabolic pathways in response to stress conditions and to shuffle them by horizontal gene transfer (HGT). A common theme in the evolution of new functions consists of gene duplication followed by functional divergence. UlaG, a unique manganese-dependent metallo-b-lactamase (MBL) enzyme involved in L-ascorbate metabolism by commensal and symbiotic enterobacteria, provides a model for the study of the emergence of new catalytic activities from the modification of an ancient fold. Furthermore, UlaG is the founding member of the so-called UlaG-like (UlaGL) protein family, a recently established and poorly characterized family comprising divalent (and perhaps trivalent)metal-binding MBLs that catalyze transformations on phosphorylated sugars and nucleotides. Results: Here we combined protein structure-guided and sequence-only molecular phylogenetic analyses to dissect the molecular evolution of UlaG and to study its phylogenomic distribution, its relatedness with present-day UlaGL protein sequences and functional conservation. Phylogenetic analyses indicate that UlaGL sequences are present in Bacteria and Archaea, with bona fide orthologs found mainly in mammalian and plant-associated Gramnegative and Gram-positive bacteria. The incongruence between the UlaGL tree and known species trees indicates exchange by HGT and suggests that the UlaGL-encoding genes provided a growth advantage under changing conditions. Our search for more distantly related protein sequences aided by structural homology has uncovered that UlaGL sequences have a common evolutionary origin with present-day RNA processing and metabolizing MBL enzymes widespread in Bacteria, Archaea, and Eukarya. This observation suggests an ancient origin for the UlaGL family within the broader trunk of the MBL superfamily by duplication, neofunctionalization and fixation. Conclusions: Our results suggest that the forerunner of UlaG was present as an RNA metabolizing enzyme in the last common ancestor, and that the modern descendants of that ancestral gene have a wide phylogenetic distribution and functional roles. We propose that the UlaGL family evolved new metabolic roles among bacterial and possibly archeal phyla in the setting of a close association with metazoans, such as in the mammalian gastrointestinal tract or in animal and plant pathogens, as well as in environmental settings. Accordingly, the major evolutionary forces shaping the UlaGL family include vertical inheritance and lineage-specific duplication and acquisition of novel metabolic functions, followed by HGT and numerous lineage-specific gene loss events.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Gene turnover rates and the evolution of gene family sizes are important aspects of genome evolution. Here, we use curated sequence data of the major chemosensory gene families from Drosophila-the gustatory receptor, odorant receptor, ionotropic receptor, and odorant-binding protein families-to conduct a comparative analysis among families, exploring different methods to estimate gene birth and death rates, including an ad hoc simulation study. Remarkably, we found that the state-of-the-art methods may produce very different rate estimates, which may lead to disparate conclusions regarding the evolution of chemosensory gene family sizes in Drosophila. Among biological factors, we found that a peculiarity of D. sechellia's gene turnover rates was a major source of bias in global estimates, whereas gene conversion had negligible effects for the families analyzed herein. Turnover rates vary considerably among families, subfamilies, and ortholog groups although all analyzed families were quite dynamic in terms of gene turnover. Computer simulations showed that the methods that use ortholog group information appear to be the most accurate for the Drosophila chemosensory families. Most importantly, these results reveal the potential of rate heterogeneity among lineages to severely bias some turnover rate estimation methods and the need of further evaluating the performance of these methods in a more diverse sampling of gene families and phylogenetic contexts. Using branch-specific codon substitution models, we find further evidence of positive selection in recently duplicated genes, which attests to a nonneutral aspect of the gene birth-and-death process.