930 resultados para PHYLOGENETIC INFERENCE
Resumo:
The complete internal transcribed spacer 1 (ITS1), 5.8S ribosomal DNA, and ITS2 region of the ribosomal DNA from 60 specimens belonging to two closely related bucephalid digeneans (Dollfustrema vaneyi and Dollfustrema hefeiensis) from different localities, hosts, and microhabitat sites were cloned to examine the level of sequence variation and the taxonomic levels to show utility in species identification and phylogeny estimation. Our data show that these molecular markers can help to discriminate the two species, which are morphologically very close and difficult to separate by classical methods. We found 21 haplotypes defined by 44 polymorphic positions in 38 individuals of D. vaneyi, and 16 haplotypes defined by 43 polymorphic positions in 22 individuals of D. hefeiensis. There is no shared haplotypes between the two species. Haplotype rather than nucleotide diversity is similar between the two species. Phylogenetic analyses reveal two robustly supported clades, one corresponding to D. vaneyi and the other corresponding to D. hefeiensis. However, the population structures between the two species seem to be incongruent and show no geographic and host-specific structure among them, further indicating that the two species may have had a more complex evolutionary history than expected.
Resumo:
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon-known as heterotachy-can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.
Resumo:
5S rDNA sequences have proven to be valuable as genetic markers to distinguish closely related species and also in the understanding of the dynamic of repetitive sequences in the genomes. In the aim to contribute to the knowledge of the evolutionary history of Leporinus (Anostomidae) and also to contribute to the understanding of the 5S rDNA sequences organization in the fish genome, analyses of 5S rDNA sequences were conducted in seven species of this genus. The 5S rRNA gene sequence was highly conserved among Leporinus species, whereas NTS exhibit high levels of variations related to insertions, deletions, microrepeats, and base substitutions. The phylogenetic analysis of the 5S rDNA sequences clustered the species into two clades that are in agreement with cytogenetic and morphological data.
Resumo:
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short edge lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.
Resumo:
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short edge lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.
Resumo:
Phylogenetic inference consist in the search of an evolutionary tree to explain the best way possible genealogical relationships of a set of species. Phylogenetic analysis has a large number of applications in areas such as biology, ecology, paleontology, etc. There are several criterias which has been defined in order to infer phylogenies, among which are the maximum parsimony and maximum likelihood. The first one tries to find the phylogenetic tree that minimizes the number of evolutionary steps needed to describe the evolutionary history among species, while the second tries to find the tree that has the highest probability of produce the observed data according to an evolutionary model. The search of a phylogenetic tree can be formulated as a multi-objective optimization problem, which aims to find trees which satisfy simultaneously (and as much as possible) both criteria of parsimony and likelihood. Due to the fact that these criteria are different there won't be a single optimal solution (a single tree), but a set of compromise solutions. The solutions of this set are called "Pareto Optimal". To find this solutions, evolutionary algorithms are being used with success nowadays.This algorithms are a family of techniques, which aren’t exact, inspired by the process of natural selection. They usually find great quality solutions in order to resolve convoluted optimization problems. The way this algorithms works is based on the handling of a set of trial solutions (trees in the phylogeny case) using operators, some of them exchanges information between solutions, simulating DNA crossing, and others apply aleatory modifications, simulating a mutation. The result of this algorithms is an approximation to the set of the “Pareto Optimal” which can be shown in a graph with in order that the expert in the problem (the biologist when we talk about inference) can choose the solution of the commitment which produces the higher interest. In the case of optimization multi-objective applied to phylogenetic inference, there is open source software tool, called MO-Phylogenetics, which is designed for the purpose of resolving inference problems with classic evolutionary algorithms and last generation algorithms. REFERENCES [1] C.A. Coello Coello, G.B. Lamont, D.A. van Veldhuizen. Evolutionary algorithms for solving multi-objective problems. Spring. Agosto 2007 [2] C. Zambrano-Vega, A.J. Nebro, J.F Aldana-Montes. MO-Phylogenetics: a phylogenetic inference software tool with multi-objective evolutionary metaheuristics. Methods in Ecology and Evolution. En prensa. Febrero 2016.
Resumo:
Perez-Losada et al. [1] analyzed 72 complete genomes corresponding to nine mammalian (67 strains) and 2 avian (5 strains) polyomavirus species using maximum likelihood and Bayesian methods of phylogenetic inference. Because some data of 2 genomes in their work are now not available in GenBank, in this work, we analyze the phylogenetic relationship of the remaining 70 complete genomes corresponding to nine mammalian (65 strains) and two avian (5 strains) polyomavirus species using a dynamical language model approach developed by our group (Yu et al., [26]). This distance method does not require sequence alignment for deriving species phylogeny based on overall similarities of the complete genomes. Our best tree separates the bird polyomaviruses (avian polyomaviruses and goose hemorrhagic polymaviruses) from the mammalian polyomaviruses, which supports the idea of splitting the genus into two subgenera. Such a split is consistent with the different viral life strategies of each group. In the mammalian polyomavirus subgenera, mouse polyomaviruses (MPV), simian viruses 40 (SV40), BK viruses (BKV) and JC viruses (JCV) are grouped as different branches as expected. The topology of our best tree is quite similar to that of the tree constructed by Perez-Losada et al.
Resumo:
Mark Pagel, Andrew Meade (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology, 53(4), 571-581. RAE2008
Resumo:
Tese de mestrado. Biologia (Biologia Molecular e Genética). Universidade de Lisboa, Faculdade de Ciências, 2014
Resumo:
Bien que les champignons soient régulièrement utilisés comme modèle d'étude des systèmes eucaryotes, leurs relations phylogénétiques soulèvent encore des questions controversées. Parmi celles-ci, la classification des zygomycètes reste inconsistante. Ils sont potentiellement paraphylétiques, i.e. regroupent de lignées fongiques non directement affiliées. La position phylogénétique du genre Schizosaccharomyces est aussi controversée: appartient-il aux Taphrinomycotina (précédemment connus comme archiascomycetes) comme prédit par l'analyse de gènes nucléaires, ou est-il plutôt relié aux Saccharomycotina (levures bourgeonnantes) tel que le suggère la phylogénie mitochondriale? Une autre question concerne la position phylogénétique des nucléariides, un groupe d'eucaryotes amiboïdes que l'on suppose étroitement relié aux champignons. Des analyses multi-gènes réalisées antérieurement n'ont pu conclure, étant donné le choix d'un nombre réduit de taxons et l'utilisation de six gènes nucléaires seulement. Nous avons abordé ces questions par le biais d'inférences phylogénétiques et tests statistiques appliqués à des assemblages de données phylogénomiques nucléaires et mitochondriales. D'après nos résultats, les zygomycètes sont paraphylétiques (Chapitre 2) bien que le signal phylogénétique issu du jeu de données mitochondriales disponibles est insuffisant pour résoudre l'ordre de cet embranchement avec une confiance statistique significative. Dans le Chapitre 3, nous montrons à l'aide d'un jeu de données nucléaires important (plus de cent protéines) et avec supports statistiques concluants, que le genre Schizosaccharomyces appartient aux Taphrinomycotina. De plus, nous démontrons que le regroupement conflictuel des Schizosaccharomyces avec les Saccharomycotina, venant des données mitochondriales, est le résultat d'un type d'erreur phylogénétique connu: l'attraction des longues branches (ALB), un artéfact menant au regroupement d'espèces dont le taux d'évolution rapide n'est pas représentatif de leur véritable position dans l'arbre phylogénétique. Dans le Chapitre 4, en utilisant encore un important jeu de données nucléaires, nous démontrons avec support statistique significatif que les nucleariides constituent le groupe lié de plus près aux champignons. Nous confirmons aussi la paraphylie des zygomycètes traditionnels tel que suggéré précédemment, avec support statistique significatif, bien que ne pouvant placer tous les membres du groupe avec confiance. Nos résultats remettent en cause des aspects d'une récente reclassification taxonomique des zygomycètes et de leurs voisins, les chytridiomycètes. Contrer ou minimiser les artéfacts phylogénétiques telle l'attraction des longues branches (ALB) constitue une question récurrente majeure. Dans ce sens, nous avons développé une nouvelle méthode (Chapitre 5) qui identifie et élimine dans une séquence les sites présentant une grande variation du taux d'évolution (sites fortement hétérotaches - sites HH); ces sites sont connus comme contribuant significativement au phénomène d'ALB. Notre méthode est basée sur un test de rapport de vraisemblance (likelihood ratio test, LRT). Deux jeux de données publiés précédemment sont utilisés pour démontrer que le retrait graduel des sites HH chez les espèces à évolution accélérée (sensibles à l'ALB) augmente significativement le support pour la topologie « vraie » attendue, et ce, de façon plus efficace comparée à d'autres méthodes publiées de retrait de sites de séquences. Néanmoins, et de façon générale, la manipulation de données préalable à l'analyse est loin d’être idéale. Les développements futurs devront viser l'intégration de l'identification et la pondération des sites HH au processus d'inférence phylogénétique lui-même.
Resumo:
We present the first assessment of phylogenetic utility of a potential novel low-copy nuclear gene region in flowering plants. A fragment of the MORE AXILLARY GROWTH 4 gene (MAX4, also known as RAMOSUS1 and DECREASED APICAL DOMINANCE1), predicted to span two introns, was isolated from members of Digitalis/Isoplexis. Phylogenetic analyses, under both maximum parsimony and Bayesian inference, were performed and revealed evidence of putative MAX4-like paralogues. The MAX4-like trees were compared with those obtained for Digitalis/Isoplexis using ITS and trnL-F, revealing a high degree of incongruence between these different DNA regions. Network analyses indicate complex patterns of evolution between the MAX4 sequences, which cannot be adequately represented on bifurcating trees. The incidence of paralogy restricts the use of MAX4 in phylogenetic inference within the study group, although MAX4 could potentially be used in combination with other DNA regions for resolving species relationships in cases where paralogues can be clearly identified.
Resumo:
We present the first assessment of phylogenetic utility of a potential novel low-copy nuclear gene region in flowering plants. A fragment of the MORE AXILLARY GROWTH 4 gene (MAX4, also known as RAMOSUS1 and DECREASED APICAL DOMINANCE1), predicted to span two introns, was isolated from members of Digitalis/Isoplexis. Phylogenetic analyses, under both maximum parsimony and Bayesian inference, were performed and revealed evidence of putative MAX4-like paralogues. The MAX4-like trees were compared with those obtained for Digitalis/Isoplexis using ITS and trnL-F, revealing a high degree of incongruence between these different DNA regions. Network analyses indicate complex patterns of evolution between the MAX4 sequences, which cannot be adequately represented on bifurcating trees. The incidence of paralogy restricts the use of MAX4 in phylogenetic inference within the study group, although MAX4 could potentially be used in combination with other DNA regions for resolving species relationships in cases where paralogues can be clearly identified.
Resumo:
We investigate the performance of phylogenetic mixture models in reducing a well-known and pervasive artifact of phylogenetic inference known as the node-density effect, comparing them to partitioned analyses of the same data. The node-density effect refers to the tendency for the amount of evolutionary change in longer branches of phylogenies to be underestimated compared to that in regions of the tree where there are more nodes and thus branches are typically shorter. Mixture models allow more than one model of sequence evolution to describe the sites in an alignment without prior knowledge of the evolutionary processes that characterize the data or how they correspond to different sites. If multiple evolutionary patterns are common in sequence evolution, mixture models may be capable of reducing node-density effects by characterizing the evolutionary processes more accurately. In gene-sequence alignments simulated to have heterogeneous patterns of evolution, we find that mixture models can reduce node-density effects to negligible levels or remove them altogether, performing as well as partitioned analyses based on the known simulated patterns. The mixture models achieve this without knowledge of the patterns that generated the data and even in some cases without specifying the full or true model of sequence evolution known to underlie the data. The latter result is especially important in real applications, as the true model of evolution is seldom known. We find the same patterns of results for two real data sets with evidence of complex patterns of sequence evolution: mixture models substantially reduced node-density effects and returned better likelihoods compared to partitioning models specifically fitted to these data. We suggest that the presence of more than one pattern of evolution in the data is a common source of error in phylogenetic inference and that mixture models can often detect these patterns even without prior knowledge of their presence in the data. Routine use of mixture models alongside other approaches to phylogenetic inference may often reveal hidden or unexpected patterns of sequence evolution and can improve phylogenetic inference.
Resumo:
We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate- variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.