989 resultados para SEQUENCE EVOLUTION
Resumo:
La phylogénie moléculaire fournit un outil complémentaire aux études paléontologiques et géologiques en permettant la construction des relations phylogénétiques entre espèces ainsi que l’estimation du temps de leur divergence. Cependant lorsqu’un arbre phylogénétique est inféré, les chercheurs se focalisent surtout sur la topologie, c'est-à-dire l’ordre de branchement relatif des différents nœuds. Les longueurs des branches de cette phylogénie sont souvent considérées comme des sous-produits, des paramètres de nuisances apportant peu d’information. Elles constituent cependant l’information primaire pour réaliser des datations moléculaires. Or la saturation, la présence de substitutions multiples à une même position, est un artefact qui conduit à une sous-estimation systématique des longueurs de branche. Nous avons décidé d’estimer l‘influence de la saturation et son impact sur l’estimation de l’âge de divergence. Nous avons choisi d’étudier le génome mitochondrial des mammifères qui est supposé avoir un niveau élevé de saturation et qui est disponible pour de nombreuses espèces. De plus, les relations phylogénétiques des mammifères sont connues, ce qui nous a permis de fixer la topologie, contrôlant ainsi un des paramètres influant la longueur des branches. Nous avons utilisé principalement deux méthodes pour améliorer la détection des substitutions multiples : (i) l’augmentation du nombre d’espèces afin de briser les plus longues branches de l’arbre et (ii) des modèles d’évolution des séquences plus ou moins réalistes. Les résultats montrèrent que la sous-estimation des longueurs de branche était très importante (jusqu'à un facteur de 3) et que l’utilisation d'un grand nombre d’espèces est un facteur qui influence beaucoup plus la détection de substitutions multiples que l’amélioration des modèles d’évolutions de séquences. Cela suggère que même les modèles d’évolution les plus complexes disponibles actuellement, (exemple: modèle CAT+Covarion, qui prend en compte l’hétérogénéité des processus de substitution entre positions et des vitesses d’évolution au cours du temps) sont encore loin de capter toute la complexité des processus biologiques. Malgré l’importance de la sous-estimation des longueurs de branche, l’impact sur les datations est apparu être relativement faible, car la sous-estimation est plus ou moins homothétique. Cela est particulièrement vrai pour les modèles d’évolution. Cependant, comme les substitutions multiples sont le plus efficacement détectées en brisant les branches en fragments les plus courts possibles via l’ajout d’espèces, se pose le problème du biais dans l’échantillonnage taxonomique, biais dû à l‘extinction pendant l’histoire de la vie sur terre. Comme ce biais entraine une sous-estimation non-homothétique, nous considérons qu’il est indispensable d’améliorer les modèles d’évolution des séquences et proposons que le protocole élaboré dans ce travail permettra d’évaluer leur efficacité vis-à-vis de la saturation.
Resumo:
L’explosion du nombre de séquences permet à la phylogénomique, c’est-à-dire l’étude des liens de parenté entre espèces à partir de grands alignements multi-gènes, de prendre son essor. C’est incontestablement un moyen de pallier aux erreurs stochastiques des phylogénies simple gène, mais de nombreux problèmes demeurent malgré les progrès réalisés dans la modélisation du processus évolutif. Dans cette thèse, nous nous attachons à caractériser certains aspects du mauvais ajustement du modèle aux données, et à étudier leur impact sur l’exactitude de l’inférence. Contrairement à l’hétérotachie, la variation au cours du temps du processus de substitution en acides aminés a reçu peu d’attention jusqu’alors. Non seulement nous montrons que cette hétérogénéité est largement répandue chez les animaux, mais aussi que son existence peut nuire à la qualité de l’inférence phylogénomique. Ainsi en l’absence d’un modèle adéquat, la suppression des colonnes hétérogènes, mal gérées par le modèle, peut faire disparaître un artéfact de reconstruction. Dans un cadre phylogénomique, les techniques de séquençage utilisées impliquent souvent que tous les gènes ne sont pas présents pour toutes les espèces. La controverse sur l’impact de la quantité de cellules vides a récemment été réactualisée, mais la majorité des études sur les données manquantes sont faites sur de petits jeux de séquences simulées. Nous nous sommes donc intéressés à quantifier cet impact dans le cas d’un large alignement de données réelles. Pour un taux raisonnable de données manquantes, il appert que l’incomplétude de l’alignement affecte moins l’exactitude de l’inférence que le choix du modèle. Au contraire, l’ajout d’une séquence incomplète mais qui casse une longue branche peut restaurer, au moins partiellement, une phylogénie erronée. Comme les violations de modèle constituent toujours la limitation majeure dans l’exactitude de l’inférence phylogénétique, l’amélioration de l’échantillonnage des espèces et des gènes reste une alternative utile en l’absence d’un modèle adéquat. Nous avons donc développé un logiciel de sélection de séquences qui construit des jeux de données reproductibles, en se basant sur la quantité de données présentes, la vitesse d’évolution et les biais de composition. Lors de cette étude nous avons montré que l’expertise humaine apporte pour l’instant encore un savoir incontournable. Les différentes analyses réalisées pour cette thèse concluent à l’importance primordiale du modèle évolutif.
Resumo:
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon-known as heterotachy-can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.
Resumo:
We investigate the performance of phylogenetic mixture models in reducing a well-known and pervasive artifact of phylogenetic inference known as the node-density effect, comparing them to partitioned analyses of the same data. The node-density effect refers to the tendency for the amount of evolutionary change in longer branches of phylogenies to be underestimated compared to that in regions of the tree where there are more nodes and thus branches are typically shorter. Mixture models allow more than one model of sequence evolution to describe the sites in an alignment without prior knowledge of the evolutionary processes that characterize the data or how they correspond to different sites. If multiple evolutionary patterns are common in sequence evolution, mixture models may be capable of reducing node-density effects by characterizing the evolutionary processes more accurately. In gene-sequence alignments simulated to have heterogeneous patterns of evolution, we find that mixture models can reduce node-density effects to negligible levels or remove them altogether, performing as well as partitioned analyses based on the known simulated patterns. The mixture models achieve this without knowledge of the patterns that generated the data and even in some cases without specifying the full or true model of sequence evolution known to underlie the data. The latter result is especially important in real applications, as the true model of evolution is seldom known. We find the same patterns of results for two real data sets with evidence of complex patterns of sequence evolution: mixture models substantially reduced node-density effects and returned better likelihoods compared to partitioning models specifically fitted to these data. We suggest that the presence of more than one pattern of evolution in the data is a common source of error in phylogenetic inference and that mixture models can often detect these patterns even without prior knowledge of their presence in the data. Routine use of mixture models alongside other approaches to phylogenetic inference may often reveal hidden or unexpected patterns of sequence evolution and can improve phylogenetic inference.
Resumo:
Genomic sequence comparison across species has enabled the elucidation of important coding and regulatory sequences encoded within DNA. Of particular interest are the noncoding regulatory sequences, which influence gene transcriptional and posttranscriptional processes. A phylogenetic footprinting strategy was employed to identify noncoding conservation patterns of 39 human and bovine orthologous genes. Seventy-three conserved noncoding sequences were identified that shared greater than 70% identity over at least 100 bp. Thirteen of these conserved sequences were also identified in the mouse genome. Evolutionary conservation of noncoding sequences across diverse species may have functional significance, and these conserved sequences may be good candidates for regulatory elements.
Resumo:
Toadlets of the genus Brachycephalus are endemic to the Atlantic rainforests of southeastern and southern Brazil. The 14 species currently described have snout-vent lengths less than 18. mm and are thought to have evolved through miniaturization: an evolutionary process leading to an extremely small adult body size. Here, we present the first comprehensive phylogenetic analysis for Brachycephalus, using a multilocus approach based on two nuclear (Rag-1 and Tyr) and three mitochondrial (Cyt b, 12S, and 16S rRNA) gene regions. Phylogenetic relationships were inferred using a partitioned Bayesian analysis of concatenated sequences and the hierarchical Bayesian method (BEST) that estimates species trees based on the multispecies coalescent model. Individual gene trees showed conflict and also varied in resolution. With the exception of the mitochondrial gene tree, no gene tree was completely resolved. The concatenated gene tree was completely resolved and is identical in topology and degree of statistical support to the individual mtDNA gene tree. On the other hand, the BEST species tree showed reduced significant node support relative to the concatenate tree and recovered a basal trichotomy, although some bipartitions were significantly supported at the tips of the species tree. Comparison of the log likelihoods for the concatenated and BEST trees suggests that the method implemented in BEST explains the multilocus data for Brachycephalus better than the Bayesian analysis of concatenated data. Landmark-based geometric morphometrics revealed marked variation in cranial shape between the species of Brachycephalus. In addition, a statistically significant association was demonstrated between variation in cranial shape and genetic distances estimated from the mtDNA and nuclear loci. Notably, B. ephippium and B. garbeana that are predicted to be sister-species in the individual and concatenated gene trees and the BEST species tree share an evolutionary novelty, the hyperossified dorsal plate. © 2011 Elsevier Inc.
Resumo:
Pós-graduação em Geografia - IGCE
Resumo:
The use of molecular data for species delimitation in Anthozoa is still a very delicate issue. This is probably due to the low genetic variation found among the molecular markers (primarily mitochondrial) commonly used for Anthozoa. Ceriantharia is an anthozoan group that has not been tested for genetic divergence at the species level. Recently, all three Atlantic species described for the genus Isarachnanthus of Atlantic Ocean, were deemed synonyms based on morphological simmilarities of only one species: Isarachnanthus maderensis. Here, we aimed to verify whether genetic relationships (using COI, 16S, ITS1 and ITS2 molecular markers) confirmed morphological affinities among members of Isarachnanthus from different regions across the Atlantic Ocean. Results from four DNA markers were completely congruent and revealed that two different species exist in the Atlantic Ocean. The low identification success and substantial overlap between intra and interspecific COI distances render the Anthozoa unsuitable for DNA barcoding, which is not true for Ceriantharia. In addition, genetic divergence within and between Ceriantharia species is more similar to that found in Medusozoa (Hydrozoa and Scyphozoa) than Anthozoa and Porifera that have divergence rates similar to typical metazoans. The two genetic species could also be separated based on micromorphological characteristics of their cnidomes. Using a specimen of Isarachnanthus bandanensis from Pacific Ocean as an outgroup, it was possible to estimate the minimum date of divergence between the clades. The cladogenesis event that formed the species of the Atlantic Ocean is estimated to have occured around 8.5 million years ago (Miocene) and several possible speciation scenarios are discussed.
Resumo:
In protein databases there is a substantial number of proteins structurally determined but without function annotation. Understanding the relationship between function and structure can be useful to predict function on a large scale. We have analyzed the similarities in global physicochemical parameters for a set of enzymes which were classified according to the four Enzyme Commission (EC) hierarchical levels. Using relevance theory we introduced a distance between proteins in the space of physicochemical characteristics. This was done by minimizing a cost function of the metric tensor built to reflect the EC classification system. Using an unsupervised clustering method on a set of 1025 enzymes, we obtained no relevant clustering formation compatible with EC classification. The distance distributions between enzymes from the same EC group and from different EC groups were compared by histograms. Such analysis was also performed using sequence alignment similarity as a distance. Our results suggest that global structure parameters are not sufficient to segregate enzymes according to EC hierarchy. This indicates that features essential for function are rather local than global. Consequently, methods for predicting function based on global attributes should not obtain high accuracy in main EC classes prediction without relying on similarities between enzymes from training and validation datasets. Furthermore, these results are consistent with a substantial number of studies suggesting that function evolves fundamentally by recruitment, i.e., a same protein motif or fold can be used to perform different enzymatic functions and a few specific amino acids (AAs) are actually responsible for enzyme activity. These essential amino acids should belong to active sites and an effective method for predicting function should be able to recognize them. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
mitochondrial genomes are generally thought to be under selection for compactness, due to their small size, consistent gene content, and a lack of introns or intergenic spacers. As more animal mitochondrial genomes are fully sequenced, rearrangements and partial duplications are being identified with increasing frequency, particularly in birds (Class Ayes). In this study, we investigate the evolutionary history of mitochondrial control region states within the avian order Psittaciformes (parrots and cockatoos). To this aim, we reconstructed a comprehensive multi-locus phylogeny of parrots, used PCR of three diagnostic fragments to classify the mitochondrial control region state as single or duplicated, and mapped these states onto the phylogeny. We further sequenced 44 selected species to validate these inferences of control region state. Ancestral state reconstruction using a range of weighting schemes identified six independent origins of mitochondrial control region duplications within Psittaciformes. Analysis of sequence data showed that varying levels of mitochondrial gene and tRNA homology and degradation were present within a given clade exhibiting duplications. Levels of divergence between control regions within an individual varied from 0-10.9% with the differences occurring mainly between 51 and 225 nucleotides 3' of the goose hairpin in domain I. Further investigations into the fates of duplicated mitochondrial genes, the potential costs and benefits of having a second control region, and the complex relationship between evolutionary rates, selection, and time since duplication are needed to fully explain these patterns in the mitochondrial genome. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.
Resumo:
Ribosome display was applied for affinity selection of antibody single-chain fragments (scFv) from a diverse library generated from mice immunized with a variant peptide of the transcription factor GCN4 dimerization domain. After three rounds of ribosome display, positive scFvs were isolated and characterized. Several different scFvs were selected, but those in the largest group were closely related to each other and differed in 0 to 5 amino acid residues with respect to their consensus sequence, the likely common progenitor. The best scFv had a dissociation constant of (4 ± 1) × 10−11 M, measured in solution. One amino acid residue in complementarity determining region L1 was found to be responsible for a 65-fold higher affinity than the likely progenitor. It appears that this high-affinity scFv was selected from the mutations occurring during ribosome display in vitro, and that this constitutes an affinity maturation inherent in this method. The in vitro-selected scFvs could be functionally expressed in the Escherichia coli periplasm with good yields or prepared by in vitro refolding. Thus, ribosome display can be a powerful methodology for in vitro library screening and simultaneous sequence evolution.
Resumo:
The Parnaíba Basin consists in an intracratonic basin whose sucession of rocks is arranged in five supersequences. The Upper Carboniferous-Lower Triassic Sequence represents the third major sedimentary cycle and corresponds to Balsas Group, which is divided into four units: Piauí Formation, Pedra de Fogo Formation, Motuca Formation and Sambaíba Formation, from base to top. Different interpretations have been made by several authors in recent decades to interpreted the depositional system and environments related to each unit that belongs to this sequence. In general way, it is described as a thick pack of siliciclastic sediments deposited under complex conditions, varying from clastic/evaporitic shallow marine to lacustrine and desert environment. Aiming to clarify the sedimentary sequence evolution, this work underwent a stratigraphic analysis of the Upper Carboniferous-Lower Triassic deposits by applying modern concepts of the sequence stratigraphy based on well and seismic database. Three main depositional sequences of higher frequency were identified in each well analyzed. The sequence 1 corresponds to rocks initially deposited by a fluvial system with braided channel characteristics which evolved to shallow marine with coastal sabkha conditions related to a transgressive stage, that later evolved to a deltaic system. The Sequence 2 corresponds to rocks deposited in a lacustrine/desert environment associated with sabkha generated during a period of increased aridity in which the area occupied by the Parnaíba Basin had been suffering. The registration of a major regressive phase is shown in Sequence 2 which evolved to a dominantly desert environment recorded in Sequence 3. Seismic stratigraphy analyses allow to define a series of stratigraphic surfaces and related genetic units, as well as to infer its lateral expression. Seismic facies associated with such sequences are dominantly parallel and sub-parallel, with good lateral continuity, suggesting the sedimentary rate was relatively constant during deposition.
Resumo:
The Parnaíba Basin consists in an intracratonic basin whose sucession of rocks is arranged in five supersequences. The Upper Carboniferous-Lower Triassic Sequence represents the third major sedimentary cycle and corresponds to Balsas Group, which is divided into four units: Piauí Formation, Pedra de Fogo Formation, Motuca Formation and Sambaíba Formation, from base to top. Different interpretations have been made by several authors in recent decades to interpreted the depositional system and environments related to each unit that belongs to this sequence. In general way, it is described as a thick pack of siliciclastic sediments deposited under complex conditions, varying from clastic/evaporitic shallow marine to lacustrine and desert environment. Aiming to clarify the sedimentary sequence evolution, this work underwent a stratigraphic analysis of the Upper Carboniferous-Lower Triassic deposits by applying modern concepts of the sequence stratigraphy based on well and seismic database. Three main depositional sequences of higher frequency were identified in each well analyzed. The sequence 1 corresponds to rocks initially deposited by a fluvial system with braided channel characteristics which evolved to shallow marine with coastal sabkha conditions related to a transgressive stage, that later evolved to a deltaic system. The Sequence 2 corresponds to rocks deposited in a lacustrine/desert environment associated with sabkha generated during a period of increased aridity in which the area occupied by the Parnaíba Basin had been suffering. The registration of a major regressive phase is shown in Sequence 2 which evolved to a dominantly desert environment recorded in Sequence 3. Seismic stratigraphy analyses allow to define a series of stratigraphic surfaces and related genetic units, as well as to infer its lateral expression. Seismic facies associated with such sequences are dominantly parallel and sub-parallel, with good lateral continuity, suggesting the sedimentary rate was relatively constant during deposition.
Resumo:
Hepatitis C virus is a positive-sense single-stranded RNA virus. The gene junction partitioning the viral glycoproteins E1 and E2 displays concurrent sequence evolution with the 3′-end of E1 highly conserved and the 5′-end of E2 highly heterogeneous. This gene junction is also believed to contain structured RNA elements, with a growing body of evidence suggesting that such structures can act as an additional level of viral replication and transcriptional control. We have previously used ultradeep pyrosequencing to analyze an amplicon library spanning the E1/E2 gene junction from a treatment naïve patient where samples were collected over 10 years of chronic HCV infection. During this timeframe maintenance of an in-frame insertion, recombination and humoral immune targeting of discrete virus sub-populations was reported. In the current study, we present evidence of epistatic evolution across the E1/E2 gene junction and observe the development of co-varying networks of codons set against a background of a complex virome with periodic shifts in population dominance. Overtime, the number of codons actively mutating decreases for all virus groupings. We identify strong synonymous co-variation between codon sites in a group of sequences harbouring a 3 bp in-frame insertion and propose that synonymous mutation acts to stabilize the RNA structural backbone.