20 resultados para Lineage-specific domain
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
Genome duplications increase genetic diversity and may facilitate the evolution of gene subfunctions. Little attention, however, has focused on the evolutionary impact of lineage-specific gene loss. Here, we show that identifying lineage-specific gene loss after genome duplication is important for understanding the evolution of gene subfunctions in surviving paralogs and for improving functional connectivity among human and model organism genomes. We examine the general principles of gene loss following duplication, coupled with expression analysis of the retinaldehyde dehydrogenase Aldh1a gene family during retinoic acid signaling in eye development as a case study. Humans have three ALDH1A genes, but teleosts have just one or two. We used comparative genomics and conserved syntenies to identify loss of ohnologs (paralogs derived from genome duplication) and to clarify uncertain phylogenies. Analysis showed that Aldh1a1 and Aldh1a2 form a clade that is sister to Aldh1a3-related genes. Genome comparisons showed secondarily loss of aldh1a1 in teleosts, revealing that Aldh1a1 is not a tetrapod innovation and that aldh1a3 was recently lost in medaka, making it the first known vertebrate with a single aldh1a gene. Interestingly, results revealed asymmetric distribution of surviving ohnologs between co-orthologous teleost chromosome segments, suggesting that local genome architecture can influence ohnolog survival. We propose a model that reconstructs the chromosomal history of the Aldh1a family in the ancestral vertebrate genome, coupled with the evolution of gene functions in surviving Aldh1a ohnologs after R1, R2, and R3 genome duplications. Results provide evidence for early subfunctionalization and late subfunction-partitioning and suggest a mechanistic model based on altered regulation leading to heterochronic gene expression to explain the acquisition or modification of subfunctions by surviving ohnologs that preserve unaltered ancestral developmental programs in the face of gene loss.
Resumo:
Proteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve. To gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains. We conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently.
Resumo:
Insects are the most diverse group of animals on the planet, comprising over 90% of all metazoan life forms, and have adapted to a wide diversity of ecosystems in nearly all environments. They have evolved highly sensitive chemical senses that are central to their interaction with their environment and to communication between individuals. Understanding the molecular bases of insect olfaction is therefore of great importance from both a basic and applied perspective. Odorant binding proteins (OBPs) are some of most abundant proteins found in insect olfactory organs, where they are the first component of the olfactory transduction cascade, carrying odorant molecules to the olfactory receptors. We carried out a search for OBPs in the genome of the parasitoid wasp Nasonia vitripennis and identified 90 sequences encoding putative OBPs. This is the largest OBP family so far reported in insects. We report unique features of the N. vitripennis OBPs, including the presence and evolutionary origin of a new subfamily of double-domain OBPs (consisting of two concatenated OBP domains), the loss of conserved cysteine residues and the expression of pseudogenes. This study also demonstrates the extremely dynamic evolution of the insect OBP family: (i) the number of different OBPs can vary greatly between species; (ii) the sequences are highly diverse, sometimes as a result of positive selection pressure with even the canonical cysteines being lost; (iii) new lineage specific domain arrangements can arise, such as the double domain OBP subfamily of wasps and mosquitoes.
Resumo:
Background: Different regions in a genome evolve at different rates depending on structural and functional constraints. Some genomic regions are highly conserved during metazoan evolution, while other regions may evolve rapidly, either in all species or in a lineage-specific manner. A strong or even moderate change in constraints in functional regions, for example in coding regions, can have significant evolutionary consequences. Results: Here we discuss a novel framework, 'BaseDiver', to classify groups of genes in humans based on the patterns of evolutionary constraints on polymorphic positions in their coding regions. Comparing the nucleotide-level divergence among mammals with the extent of deviation from the ancestral base in the human lineage, we identify patterns of evolutionary pressure on nonsynonymous base-positions in groups of genes belonging to the same functional category. Focussing on groups of genes in functional categories, we find that transcription factors contain a significant excess of nonsynonymous base-positions that are conserved in other mammals but changed in human, while immunity related genes harbour mutations at base-positions that evolve rapidly in all mammals including humans due to strong preference for advantageous alleles. Genes involved in olfaction also evolve rapidly in all mammals, and in humans this appears to be due to weak negative selection. Conclusion: While recent studies have identified genes under positive selection in humans, our approach identifies evolutionary constraints on Gene Ontology groups identifying changes in humans relative to some of the other mammals.
Resumo:
It is generally accepted that the extent of phenotypic change between human and great apes is dissonant with the rate of molecular change. Between these two groups, proteins are virtually identical, cytogenetically there are few rearrangements that distinguish ape-human chromosomes, and rates of single-base-pair change and retrotransposon activity have slowed particularly within hominid lineages when compared to rodents or monkeys. Studies of gene family evolution indicate that gene loss and gain are enriched within the primate lineage. Here, we perform a systematic analysis of duplication content of four primate genomes (macaque, orang-utan, chimpanzee and human) in an effort to understand the pattern and rates of genomic duplication during hominid evolution. We find that the ancestral branch leading to human and African great apes shows the most significant increase in duplication activity both in terms of base pairs and in terms of events. This duplication acceleration within the ancestral species is significant when compared to lineage-specific rate estimates even after accounting for copy-number polymorphism and homoplasy. We discover striking examples of recurrent and independent gene-containing duplications within the gorilla and chimpanzee that are absent in the human lineage. Our results suggest that the evolutionary properties of copy-number mutation differ significantly from other forms of genetic mutation and, in contrast to the hominid slowdown of single-base-pair mutations, there has been a genomic burst of duplication activity at this period during human evolution.
Resumo:
Motivation: The comparative analysis of gene gain and loss rates is critical for understanding the role of natural selection and adaptation in shaping gene family sizes. Studying complete genome data from closely related species allows accurate estimation of gene family turnover rates. Current methods and software tools, however, are not well designed for dealing with certain kinds of functional elements, such as microRNAs or transcription factor binding sites. Results: Here, we describe BadiRate, a new software tool to estimate family turnover rates, as well as the number of elements in internal phylogenetic nodes, by likelihood-based methods and parsimony. It implements two stochastic population models, which provide the appropriate statistical framework for testing hypothesis, such as lineage-specific gene family expansions or contractions. We have assessed the accuracy of BadiRate by computer simulations, and have also illustrated its functionality by analyzing a representative empirical dataset.
Resumo:
Motivation: The comparative analysis of gene gain and loss rates is critical for understanding the role of natural selection and adaptation in shaping gene family sizes. Studying complete genome data from closely related species allows accurate estimation of gene family turnover rates. Current methods and software tools, however, are not well designed for dealing with certain kinds of functional elements, such as microRNAs or transcription factor binding sites. Results: Here, we describe BadiRate, a new software tool to estimate family turnover rates, as well as the number of elements in internal phylogenetic nodes, by likelihood-based methods and parsimony. It implements two stochastic population models, which provide the appropriate statistical framework for testing hypothesis, such as lineage-specific gene family expansions or contractions. We have assessed the accuracy of BadiRate by computer simulations, and have also illustrated its functionality by analyzing a representative empirical dataset.
Resumo:
Sugar beet (Beta vulgaris ssp. vulgaris) is an important crop of temperate climates which provides nearly 30% of the world's annual sugar production and is a source for bioethanol and animal feed. The species belongs to the order of Caryophylalles, is diploid with 2n = 18 chromosomes, has an estimated genome size of 714-758 megabases and shares an ancient genome triplication with other eudicot plants. Leafy beets have been cultivated since Roman times, but sugar beet is one of the most recently domesticated crops. It arose in the late eighteenth century when lines accumulating sugar in the storage root were selected from crosses made with chard and fodder beet. Here we present a reference genome sequence for sugar beet as the first non-rosid, non-asterid eudicot genome, advancing comparative genomics and phylogenetic reconstructions. The genome sequence comprises 567 megabases, of which 85% could be assigned to chromosomes. The assembly covers a large proportion of the repetitive sequence content that was estimated to be 63%. We predicted 27,421 protein-coding genes supported by transcript data and annotated them on the basis of sequence homology. Phylogenetic analyses provided evidence for the separation of Caryophyllales before the split of asterids and rosids, and revealed lineage-specific gene family expansions and losses. We sequenced spinach (Spinacia oleracea), another Caryophyllales species, and validated features that separate this clade from rosids and asterids. Intraspecific genomic variation was analysed based on the genome sequences of sea beet (Beta vulgaris ssp. maritima; progenitor of all beet crops) and four additional sugar beet accessions. We identified seven million variant positions in the reference genome, and also large regions of low variability, indicating artificial selection. The sugar beet genome sequence enables the identification of genes affecting agronomically relevant traits, supports molecular breeding and maximizes the plant's potential in energy biotechnology.
Resumo:
Background: Bacterial populations are highly successful at colonizing new habitats and adapting to changing environmental conditions, partly due to their capacity to evolve novel virulence and metabolic pathways in response to stress conditions and to shuffle them by horizontal gene transfer (HGT). A common theme in the evolution of new functions consists of gene duplication followed by functional divergence. UlaG, a unique manganese-dependent metallo-b-lactamase (MBL) enzyme involved in L-ascorbate metabolism by commensal and symbiotic enterobacteria, provides a model for the study of the emergence of new catalytic activities from the modification of an ancient fold. Furthermore, UlaG is the founding member of the so-called UlaG-like (UlaGL) protein family, a recently established and poorly characterized family comprising divalent (and perhaps trivalent)metal-binding MBLs that catalyze transformations on phosphorylated sugars and nucleotides. Results: Here we combined protein structure-guided and sequence-only molecular phylogenetic analyses to dissect the molecular evolution of UlaG and to study its phylogenomic distribution, its relatedness with present-day UlaGL protein sequences and functional conservation. Phylogenetic analyses indicate that UlaGL sequences are present in Bacteria and Archaea, with bona fide orthologs found mainly in mammalian and plant-associated Gramnegative and Gram-positive bacteria. The incongruence between the UlaGL tree and known species trees indicates exchange by HGT and suggests that the UlaGL-encoding genes provided a growth advantage under changing conditions. Our search for more distantly related protein sequences aided by structural homology has uncovered that UlaGL sequences have a common evolutionary origin with present-day RNA processing and metabolizing MBL enzymes widespread in Bacteria, Archaea, and Eukarya. This observation suggests an ancient origin for the UlaGL family within the broader trunk of the MBL superfamily by duplication, neofunctionalization and fixation. Conclusions: Our results suggest that the forerunner of UlaG was present as an RNA metabolizing enzyme in the last common ancestor, and that the modern descendants of that ancestral gene have a wide phylogenetic distribution and functional roles. We propose that the UlaGL family evolved new metabolic roles among bacterial and possibly archeal phyla in the setting of a close association with metazoans, such as in the mammalian gastrointestinal tract or in animal and plant pathogens, as well as in environmental settings. Accordingly, the major evolutionary forces shaping the UlaGL family include vertical inheritance and lineage-specific duplication and acquisition of novel metabolic functions, followed by HGT and numerous lineage-specific gene loss events.
Resumo:
Background In most eumetazoans studied so far, Hox genes determine the identity of structures along the main body axis. They are usually linked in genomic clusters and, in the case of the vertebrate embryo, are expressed with spatial and temporal colinearity. Outside vertebrates, temporal colinearity has been reported in the cephalochordate amphioxus (the least derived living relative of the chordate ancestor) but only for anterior and central genes, namely Hox1 to Hox4 and Hox6. However, most of the Hox gene expression patterns in amphioxus have not been reported. To gain global insights into the evolution of Hox clusters in chordates, we investigated a more extended expression profile of amphioxus Hox genes. Results Here we report an extended expression profile of the European amphioxus Branchiostoma lanceolatum Hox genes and describe that all Hox genes, except Hox13, are expressed during development. Interestingly, we report the breaking of both spatial and temporal colinearity for at least Hox6 and Hox14, which thus have escaped from the classical Hox code concept. We show a previously unidentified Hox6 expression pattern and a faint expression for posterior Hox genes in structures such as the posterior mesoderm, notochord, and hindgut. Unexpectedly, we found that amphioxus Hox14 had the most divergent expression pattern. This gene is expressed in the anterior cerebral vesicle and pharyngeal endoderm. Amphioxus Hox14 expression represents the first report of Hox gene expression in the most anterior part of the central nervous system. Nevertheless, despite these divergent expression patterns, amphioxus Hox6 and Hox14 seem to be still regulated by retinoic acid. Conclusions Escape from colinearity by Hox genes is not unusual in either vertebrates or amphioxus and we suggest that those genes escaping from it are probably associated with the patterning of lineage-specific morphological traits, requiring the loss of those developmental constraints that kept them colinear.
Resumo:
The genome of the bladderwort Utricularia gibba provides an unparalleled opportunity to uncover the adaptive landscape of an aquatic carnivorous plant with unique phenotypic features such as absence of roots, development of water-filled suction bladders, and a highly ramified branching pattern. Despite its tiny size, the U. gibba genome accommodates approximately as many genes as other plant genomes. To examine the relationship between the compactness of its genome and gene turnover, we compared the U. gibba genome with that of four other eudicot species, defining a total of 17,324 gene families (orthogroups). These families were further classified as either 1) lineage-specific expanded/contracted or 2) stable in size. The U. gibba-expanded families are generically related to three main phenotypic features: 1) trap physiology, 2) key plant morphogenetic/developmental pathways, and 3) response to environmental stimuli, including adaptations to life in aquatic environments. Further scans for signatures of protein functional specialization permitted identification of seven candidate genes with amino acid changes putatively fixed by positive Darwinian selection in the U. gibba lineage. The Arabidopsis orthologs of these genes (AXR, UMAMIT41, IGS, TAR2, SOL1, DEG9, and DEG10) are involved in diverse plant biological functions potentially relevant for U. gibba phenotypic diversification, including 1) auxin metabolism and signal transduction, 2) flowering induction and floral meristem transition, 3) root development, and 4) peptidases. Taken together, our results suggest numerous candidate genes and gene families as interesting targets for further experimental confirmation of their functional and adaptive roles in the U. gibba's unique lifestyle and highly specialized body plan.
Resumo:
The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition,production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web is one of the most innovative building blocks of PANACEA. The CAC, which is the first stage in the PANACEA pipeline for building Language Resources, adopts an efficient and distributed methodology to crawl for web documents with rich textual content in specific languages and predefined domains. The CAC includes modules that can acquire parallel data from sites with in-domain content available in more than one language. In order to extrinsically evaluate the CAC methodology, we have conducted several experiments that used crawled parallel corpora for the identification and extraction of parallel sentences using sentence alignment. The corpora were then successfully used for domain adaptation of Machine Translation Systems.
Resumo:
This paper proposes an exploration of the methodology of utilityfunctions that distinguishes interpretation from representation. Whilerepresentation univocally assigns numbers to the entities of the domainof utility functions, interpretation relates these entities withempirically observable objects of choice. This allows us to makeexplicit the standard interpretation of utility functions which assumesthat two objects have the same utility if and only if the individual isindifferent among them. We explore the underlying assumptions of suchan hypothesis and propose a non-standard interpretation according towhich objects of choice have a well-defined utility although individualsmay vary in the way they treat these objects in a specific context.We provide examples of such a methodological approach that may explainsome reversal of preferences and suggest possible mathematicalformulations for further research.
Resumo:
It is very well known that the first succesful valuation of a stock option was done by solving a deterministic partial differential equation (PDE) of the parabolic type with some complementary conditions specific for the option. In this approach, the randomness in the option value process is eliminated through a no-arbitrage argument. An alternative approach is to construct a replicating portfolio for the option. From this viewpoint the payoff function for the option is a random process which, under a new probabilistic measure, turns out to be of a special type, a martingale. Accordingly, the value of the replicating portfolio (equivalently, of the option) is calculated as an expectation, with respect to this new measure, of the discounted value of the payoff function. Since the expectation is, by definition, an integral, its calculation can be made simpler by resorting to powerful methods already available in the theory of analytic functions. In this paper we use precisely two of those techniques to find the well-known value of a European call
Resumo:
The thermodynamic functions of a Fermi gas with spin population imbalance are studied in the temperature-asymmetry plane in the BCS limit. The low-temperature domain is characterized by an anomalous enhancement of the entropy and the specific heat above their values in the unpaired state, decrease of the gap and eventual unpairing phase transition as the temperature is lowered. The unpairing phase transition induces a second jump in the specific heat, which can be measured in calorimetric experiments. While the superfluid is unstable against a supercurrent carrying state, it may sustain a metastable state if cooled adiabatically down from the stable high-temperature domain. In the latter domain the temperature dependence of the gap and related functions is analogous to the predictions of the BCS theory.