67 resultados para Markov Chain Monte Carlo
Resumo:
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon-known as heterotachy-can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.
Resumo:
Varroa destructor is a parasitic mite of the Eastern honeybee Apis cerana. Fifty years ago, two distinct evolutionary lineages (Korean and Japanese) invaded the Western honeybee Apis mellifera. This haplo-diploid parasite species reproduces mainly through brother sister matings, a system which largely favors the fixation of new mutations. In a worldwide sample of 225 individuals from 21 locations collected on Western honeybees and analyzed at 19 microsatellite loci, a series of de novo mutations was observed. Using historical data concerning the invasion, this original biological system has been exploited to compare three mutation models with allele size constraints for microsatellite markers: stepwise (SMM) and generalized (GSM) mutation models, and a model with mutation rate increasing exponentially with microsatellite length (ESM). Posterior probabilities of the three models have been estimated for each locus individually using reversible jump Markov Chain Monte Carlo. The relative support of each model varies widely among loci, but the GSM is the only model that always receives at least 9% support, whatever the locus. The analysis also provides robust estimates of mutation parameters for each locus and of the divergence time of the two invasive lineages (67,000 generations with a 90% credibility interval of 35,000-174,000). With an average of 10 generations per year, this divergence time fits with the last post-glacial Korea Japan land separation. (c) 2005 Elsevier Inc. All rights reserved.
Resumo:
We describe a Bayesian method for investigating correlated evolution of discrete binary traits on phylogenetic trees. The method fits a continuous-time Markov model to a pair of traits, seeking the best fitting models that describe their joint evolution on a phylogeny. We employ the methodology of reversible-jump ( RJ) Markov chain Monte Carlo to search among the large number of possible models, some of which conform to independent evolution of the two traits, others to correlated evolution. The RJ Markov chain visits these models in proportion to their posterior probabilities, thereby directly estimating the support for the hypothesis of correlated evolution. In addition, the RJ Markov chain simultaneously estimates the posterior distributions of the rate parameters of the model of trait evolution. These posterior distributions can be used to test among alternative evolutionary scenarios to explain the observed data. All results are integrated over a sample of phylogenetic trees to account for phylogenetic uncertainty. We implement the method in a program called RJ Discrete and illustrate it by analyzing the question of whether mating system and advertisement of estrus by females have coevolved in the Old World monkeys and great apes.
Resumo:
This article presents a statistical method for detecting recombination in DNA sequence alignments, which is based on combining two probabilistic graphical models: (1) a taxon graph (phylogenetic tree) representing the relationship between the taxa, and (2) a site graph (hidden Markov model) representing interactions between different sites in the DNA sequence alignments. We adopt a Bayesian approach and sample the parameters of the model from the posterior distribution with Markov chain Monte Carlo, using a Metropolis-Hastings and Gibbs-within-Gibbs scheme. The proposed method is tested on various synthetic and real-world DNA sequence alignments, and we compare its performance with the established detection methods RECPARS, PLATO, and TOPAL, as well as with two alternative parameter estimation schemes.
Resumo:
In this paper we analyse applicability and robustness of Markov chain Monte Carlo algorithms for eigenvalue problems. We restrict our consideration to real symmetric matrices. Almost Optimal Monte Carlo (MAO) algorithms for solving eigenvalue problems are formulated. Results for the structure of both - systematic and probability error are presented. It is shown that the values of both errors can be controlled independently by different algorithmic parameters. The results present how the systematic error depends on the matrix spectrum. The analysis of the probability error is presented. It shows that the close (in some sense) the matrix under consideration is to the stochastic matrix the smaller is this error. Sufficient conditions for constructing robust and interpolation Monte Carlo algorithms are obtained. For stochastic matrices an interpolation Monte Carlo algorithm is constructed. A number of numerical tests for large symmetric dense matrices are performed in order to study experimentally the dependence of the systematic error from the structure of matrix spectrum. We also study how the probability error depends on the balancing of the matrix. (c) 2007 Elsevier Inc. All rights reserved.
Resumo:
Many well-established statistical methods in genetics were developed in a climate of severe constraints on computational power. Recent advances in simulation methodology now bring modern, flexible statistical methods within the reach of scientists having access to a desktop workstation. We illustrate the potential advantages now available by considering the problem of assessing departures from Hardy-Weinberg (HW) equilibrium. Several hypothesis tests of HW have been established, as well as a variety of point estimation methods for the parameter which measures departures from HW under the inbreeding model. We propose a computational, Bayesian method for assessing departures from HW, which has a number of important advantages over existing approaches. The method incorporates the effects-of uncertainty about the nuisance parameters--the allele frequencies--as well as the boundary constraints on f (which are functions of the nuisance parameters). Results are naturally presented visually, exploiting the graphics capabilities of modern computer environments to allow straightforward interpretation. Perhaps most importantly, the method is founded on a flexible, likelihood-based modelling framework, which can incorporate the inbreeding model if appropriate, but also allows the assumptions of the model to he investigated and, if necessary, relaxed. Under appropriate conditions, information can be shared across loci and, possibly, across populations, leading to more precise estimation. The advantages of the method are illustrated by application both to simulated data and to data analysed by alternative methods in the recent literature.
Resumo:
In this paper we deal with performance analysis of Monte Carlo algorithm for large linear algebra problems. We consider applicability and efficiency of the Markov chain Monte Carlo for large problems, i.e., problems involving matrices with a number of non-zero elements ranging between one million and one billion. We are concentrating on analysis of the almost Optimal Monte Carlo (MAO) algorithm for evaluating bilinear forms of matrix powers since they form the so-called Krylov subspaces. Results are presented comparing the performance of the Robust and Non-robust Monte Carlo algorithms. The algorithms are tested on large dense matrices as well as on large unstructured sparse matrices.
Resumo:
Undirected graphical models are widely used in statistics, physics and machine vision. However Bayesian parameter estimation for undirected models is extremely challenging, since evaluation of the posterior typically involves the calculation of an intractable normalising constant. This problem has received much attention, but very little of this has focussed on the important practical case where the data consists of noisy or incomplete observations of the underlying hidden structure. This paper specifically addresses this problem, comparing two alternative methodologies. In the first of these approaches particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently explore the parameter space, combined with the exchange algorithm (Murray et al., 2006) for avoiding the calculation of the intractable normalising constant (a proof showing that this combination targets the correct distribution in found in a supplementary appendix online). This approach is compared with approximate Bayesian computation (Pritchard et al., 1999). Applications to estimating the parameters of Ising models and exponential random graphs from noisy data are presented. Each algorithm used in the paper targets an approximation to the true posterior due to the use of MCMC to simulate from the latent graphical model, in lieu of being able to do this exactly in general. The supplementary appendix also describes the nature of the resulting approximation.
Resumo:
The steadily accumulating literature on technical efficiency in fisheries attests to the importance of efficiency as an indicator of fleet condition and as an object of management concern. In this paper, we extend previous work by presenting a Bayesian hierarchical approach that yields both efficiency estimates and, as a byproduct of the estimation algorithm, probabilistic rankings of the relative technical efficiencies of fishing boats. The estimation algorithm is based on recent advances in Markov Chain Monte Carlo (MCMC) methods—Gibbs sampling, in particular—which have not been widely used in fisheries economics. We apply the method to a sample of 10,865 boat trips in the US Pacific hake (or whiting) fishery during 1987–2003. We uncover systematic differences between efficiency rankings based on sample mean efficiency estimates and those that exploit the full posterior distributions of boat efficiencies to estimate the probability that a given boat has the highest true mean efficiency.
Resumo:
The evolutionary history of gains and losses of vegetative reproductive propagules (soredia) in Porpidia s.l., a group of lichen-forming ascomycetes, was clarified using Bayesian Markov chain Monte Carlo (MCMC) approaches to monophyly tests and a combined MCMC and maximum likelihood approach to ancestral character state reconstructions. The MCMC framework provided confidence estimates for the reconstructions of relationships and ancestral character states, which formed the basis for tests of evolutionary hypotheses. Monophyly tests rejected all hypotheses that predicted any clustering of reproductive modes in extant taxa. In addition, a nearest-neighbor statistic could not reject the hypothesis that the vegetative reproductive mode is randomly distributed throughout the group. These results show that transitions between presence and absence of the vegetative reproductive mode within Porpidia s.l. occurred several times and independently of each other. Likelihood reconstructions of ancestral character states at selected nodes suggest that - contrary to previous thought - the ancestor to Porpidia s.l. already possessed the vegetative reproductive mode. Furthermore, transition rates are reconstructed asymmetrically with the vegetative reproductive mode being gained at a much lower rate than it is lost. A cautious note has to be added, because a simulation study showed that the ancestral character state reconstructions were highly dependent on taxon sampling. However, our central conclusions, particularly the higher rate of change from vegetative reproductive mode present to absent than vice versa within Porpidia s.l., were found to be broadly independent of taxon sampling. [Ancestral character state reconstructions; Ascomycota, Bayesian inference; hypothesis testing; likelihood; MCMC; Porpidia; reproductive systems]
Resumo:
Micromorphological characters of the fruiting bodies, such as ascus-type and hymenial amyloidity, and secondary chemistry have been widely employed as key characters in Ascomycota classification. However, the evolution of these characters has yet not been studied using molecular phylogenies. We have used a combined Bayesian and maximum likelihood based approach to trace character evolution on a tree inferred from a combined analysis of nuclear and mitochondrial ribosomal DNA sequences. The maximum likelihood aspect overcomes simplifications inherent in maximum parsimony methods, whereas the Markov chain Monte Carlo aspect renders results independent of any particular phylogenetic tree. The results indicate that the evolution of the two chemical characters is quite different, being stable once developed for the medullary lecanoric acid, whereas the cortical chlorinated xanthones appear to have been lost several times. The current ascus-types and the amyloidity of the hymenial gel in Pertusariaceae appear to have been developed within the family. The basal ascus-type of pertusarialean fungi remains unknown. (c) 2006 The Linnean Society of London, Biological Journal of the Linnean Society, 2006, 89, 615-626.
Resumo:
The identification of signatures of natural selection in genomic surveys has become an area of intense research, stimulated by the increasing ease with which genetic markers can be typed. Loci identified as subject to selection may be functionally important, and hence (weak) candidates for involvement in disease causation. They can also be useful in determining the adaptive differentiation of populations, and exploring hypotheses about speciation. Adaptive differentiation has traditionally been identified from differences in allele frequencies among different populations, summarised by an estimate of F-ST. Low outliers relative to an appropriate neutral population-genetics model indicate loci subject to balancing selection, whereas high outliers suggest adaptive (directional) selection. However, the problem of identifying statistically significant departures from neutrality is complicated by confounding effects on the distribution of F-ST estimates, and current methods have not yet been tested in large-scale simulation experiments. Here, we simulate data from a structured population at many unlinked, diallelic loci that are predominantly neutral but with some loci subject to adaptive or balancing selection. We develop a hierarchical-Bayesian method, implemented via Markov chain Monte Carlo (MCMC), and assess its performance in distinguishing the loci simulated under selection from the neutral loci. We also compare this performance with that of a frequentist method, based on moment-based estimates of F-ST. We find that both methods can identify loci subject to adaptive selection when the selection coefficient is at least five times the migration rate. Neither method could reliably distinguish loci under balancing selection in our simulations, even when the selection coefficient is twenty times the migration rate.
Resumo:
Biologists frequently attempt to infer the character states at ancestral nodes of a phylogeny from the distribution of traits observed in contemporary organisms. Because phylogenies are normally inferences from data, it is desirable to account for the uncertainty in estimates of the tree and its branch lengths when making inferences about ancestral states or other comparative parameters. Here we present a general Bayesian approach for testing comparative hypotheses across statistically justified samples of phylogenies, focusing on the specific issue of reconstructing ancestral states. The method uses Markov chain Monte Carlo techniques for sampling phylogenetic trees and for investigating the parameters of a statistical model of trait evolution. We describe how to combine information about the uncertainty of the phylogeny with uncertainty in the estimate of the ancestral state. Our approach does not constrain the sample of trees only to those that contain the ancestral node or nodes of interest, and we show how to reconstruct ancestral states of uncertain nodes using a most-recent-common-ancestor approach. We illustrate the methods with data on ribonuclease evolution in the Artiodactyla. Software implementing the methods ( BayesMultiState) is available from the authors.
Resumo:
We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate- variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.
Resumo:
This article introduces a new general method for genealogical inference that samples independent genealogical histories using importance sampling (IS) and then samples other parameters with Markov chain Monte Carlo (MCMC). It is then possible to more easily utilize the advantages of importance sampling in a fully Bayesian framework. The method is applied to the problem of estimating recent changes in effective population size from temporally spaced gene frequency data. The method gives the posterior distribution of effective population size at the time of the oldest sample and at the time of the most recent sample, assuming a model of exponential growth or decline during the interval. The effect of changes in number of alleles, number of loci, and sample size on the accuracy of the method is described using test simulations, and it is concluded that these have an approximately equivalent effect. The method is used on three example data sets and problems in interpreting the posterior densities are highlighted and discussed.