926 resultados para Approximate Bayesian computation
Resumo:
Background The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations. Results Here we present ABCtoolbox, a series of open source programs to perform Approximate Bayesian Computations (ABC). It implements various ABC algorithms including rejection sampling, MCMC without likelihood, a Particle-based sampler and ABC-GLM. ABCtoolbox is bundled with, but not limited to, a program that allows parameter inference in a population genetics context and the simultaneous use of different types of markers with different ploidy levels. In addition, ABCtoolbox can also interact with most simulation and summary statistics computation programs. The usability of the ABCtoolbox is demonstrated by inferring the evolutionary history of two evolutionary lineages of Microtus arvalis. Using nuclear microsatellites and mitochondrial sequence data in the same estimation procedure enabled us to infer sex-specific population sizes and migration rates and to find that males show smaller population sizes but much higher levels of migration than females. Conclusion ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results.
Resumo:
Approximate Bayesian computation (ABC) is a highly flexible technique that allows the estimation of parameters under demographic models that are too complex to be handled by full-likelihood methods. We assess the utility of this method to estimate the parameters of range expansion in a two-dimensional stepping-stone model, using samples from either a single deme or multiple demes. A minor modification to the ABC procedure is introduced, which leads to an improvement in the accuracy of estimation. The method is then used to estimate the expansion time and migration rates for five natural common vole populations in Switzerland typed for a sex-linked marker and a nuclear marker. Estimates based on both markers suggest that expansion occurred < 10,000 years ago, after the most recent glaciation, and that migration rates are strongly male biased.
Resumo:
Undirected graphical models are widely used in statistics, physics and machine vision. However Bayesian parameter estimation for undirected models is extremely challenging, since evaluation of the posterior typically involves the calculation of an intractable normalising constant. This problem has received much attention, but very little of this has focussed on the important practical case where the data consists of noisy or incomplete observations of the underlying hidden structure. This paper specifically addresses this problem, comparing two alternative methodologies. In the first of these approaches particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently explore the parameter space, combined with the exchange algorithm (Murray et al., 2006) for avoiding the calculation of the intractable normalising constant (a proof showing that this combination targets the correct distribution in found in a supplementary appendix online). This approach is compared with approximate Bayesian computation (Pritchard et al., 1999). Applications to estimating the parameters of Ising models and exponential random graphs from noisy data are presented. Each algorithm used in the paper targets an approximation to the true posterior due to the use of MCMC to simulate from the latent graphical model, in lieu of being able to do this exactly in general. The supplementary appendix also describes the nature of the resulting approximation.
Resumo:
The present distribution of freshwater fish in the Alpine region has been strongly affected by colonization events occurring after the last glacial maximum (LGM), some 20,000 years ago. We use here a spatially explicit simulation framework to model and better understand their colonization dynamics in the Swiss Rhine basin. This approach is applied to the European bullhead (Cottus gobio), which is an ideal model organism to study fish past demographic processes since it has not been managed by humans. The molecular diversity of eight sampled populations is simulated and compared to observed data at six microsatellite loci under an approximate Bayesian computation framework to estimate the parameters of the colonization process. Our demographic estimates fit well with current knowledge about the biology of this species, but they suggest that the Swiss Rhine basin was colonized very recently, after the Younger Dryas some 6600 years ago. We discuss the implication of this result, as well as the strengths and limits of the spatially explicit approach coupled to the approximate Bayesian computation framework.
Resumo:
Performing organization: Dept. of Statistics, University of Michigan.
Resumo:
This thesis is concerned with approximate inference in dynamical systems, from a variational Bayesian perspective. When modelling real world dynamical systems, stochastic differential equations appear as a natural choice, mainly because of their ability to model the noise of the system by adding a variant of some stochastic process to the deterministic dynamics. Hence, inference in such processes has drawn much attention. Here two new extended frameworks are derived and presented that are based on basis function expansions and local polynomial approximations of a recently proposed variational Bayesian algorithm. It is shown that the new extensions converge to the original variational algorithm and can be used for state estimation (smoothing). However, the main focus is on estimating the (hyper-) parameters of these systems (i.e. drift parameters and diffusion coefficients). The new methods are numerically validated on a range of different systems which vary in dimensionality and non-linearity. These are the Ornstein-Uhlenbeck process, for which the exact likelihood can be computed analytically, the univariate and highly non-linear, stochastic double well and the multivariate chaotic stochastic Lorenz '63 (3-dimensional model). The algorithms are also applied to the 40 dimensional stochastic Lorenz '96 system. In this investigation these new approaches are compared with a variety of other well known methods such as the ensemble Kalman filter / smoother, a hybrid Monte Carlo sampler, the dual unscented Kalman filter (for jointly estimating the systems states and model parameters) and full weak-constraint 4D-Var. Empirical analysis of their asymptotic behaviour as a function of observation density or length of time window increases is provided.
Resumo:
Summary (in English) Computer simulations provide a practical way to address scientific questions that would be otherwise intractable. In evolutionary biology, and in population genetics in particular, the investigation of evolutionary processes frequently involves the implementation of complex models, making simulations a particularly valuable tool in the area. In this thesis work, I explored three questions involving the geographical range expansion of populations, taking advantage of spatially explicit simulations coupled with approximate Bayesian computation. First, the neutral evolutionary history of the human spread around the world was investigated, leading to a surprisingly simple model: A straightforward diffusion process of migrations from east Africa throughout a world map with homogeneous landmasses replicated to very large extent the complex patterns observed in real human populations, suggesting a more continuous (as opposed to structured) view of the distribution of modern human genetic diversity, which may play a better role as a base model for further studies. Second, the postglacial evolution of the European barn owl, with the formation of a remarkable coat-color cline, was inspected with two rounds of simulations: (i) determine the demographic background history and (ii) test the probability of a phenotypic cline, like the one observed in the natural populations, to appear without natural selection. We verified that the modern barn owl population originated from a single Iberian refugium and that they formed their color cline, not due to neutral evolution, but with the necessary participation of selection. The third and last part of this thesis refers to a simulation-only study inspired by the barn owl case above. In this chapter, we showed that selection is, indeed, effective during range expansions and that it leaves a distinguished signature, which can then be used to detect and measure natural selection in range-expanding populations. Résumé (en français) Les simulations fournissent un moyen pratique pour répondre à des questions scientifiques qui seraient inabordable autrement. En génétique des populations, l'étude des processus évolutifs implique souvent la mise en oeuvre de modèles complexes, et les simulations sont un outil particulièrement précieux dans ce domaine. Dans cette thèse, j'ai exploré trois questions en utilisant des simulations spatialement explicites dans un cadre de calculs Bayésiens approximés (approximate Bayesian computation : ABC). Tout d'abord, l'histoire de la colonisation humaine mondiale et de l'évolution de parties neutres du génome a été étudiée grâce à un modèle étonnement simple. Un processus de diffusion des migrants de l'Afrique orientale à travers un monde avec des masses terrestres homogènes a reproduit, dans une très large mesure, les signatures génétiques complexes observées dans les populations humaines réelles. Un tel modèle continu (opposé à un modèle structuré en populations) pourrait être très utile comme modèle de base dans l'étude de génétique humaine à l'avenir. Deuxièmement, l'évolution postglaciaire d'un gradient de couleur chez l'Effraie des clocher (Tyto alba) Européenne, a été examiné avec deux séries de simulations pour : (i) déterminer l'histoire démographique de base et (ii) tester la probabilité qu'un gradient phénotypique, tel qu'observé dans les populations naturelles puisse apparaître sans sélection naturelle. Nous avons montré que la population actuelle des chouettes est sortie d'un unique refuge ibérique et que le gradient de couleur ne peux pas s'être formé de manière neutre (sans l'action de la sélection naturelle). La troisième partie de cette thèse se réfère à une étude par simulations inspirée par l'étude de l'Effraie. Dans ce dernier chapitre, nous avons montré que la sélection est, en effet, aussi efficace dans les cas d'expansion d'aire de distribution et qu'elle laisse une signature unique, qui peut être utilisée pour la détecter et estimer sa force.
Resumo:
Report for the scientific sojourn at the University of Reading, United Kingdom, from January until May 2008. The main objectives have been firstly to infer population structure and parameters in demographic models using a total of 13 microsatellite loci for genotyping approximately 30 individuals per population in 10 Palinurus elephas populations both from Mediterranean and Atlantic waters. Secondly, developing statistical methods to identify discrepant loci, possibly under selection and implement those methods using the R software environment. It is important to consider that the calculation of the probability distribution of the demographic and mutational parameters for a full genetic data set is numerically difficult for complex demographic history (Stephens 2003). The Approximate Bayesian Computation (ABC), based on summary statistics to infer posterior distributions of variable parameters without explicit likelihood calculations, can surmount this difficulty. This would allow to gather information on different demographic prior values (i.e. effective population sizes, migration rate, microsatellite mutation rate, mutational processes) and assay the sensitivity of inferences to demographic priors by assuming different priors.
Resumo:
The use of molecular data to reconstruct the history of divergence and gene flow between populations of closely related taxa represents a challenging problem. It has been proposed that the long-standing debate about the geography of speciation can be resolved by comparing the likelihoods of a model of isolation with migration and a model of secondary contact. However, data are commonly only fit to a model of isolation with migration and rarely tested against the secondary contact alternative. Furthermore, most demographic inference methods have neglected variation in introgression rates and assume that the gene flow parameter (Nm) is similar among loci. Here, we show that neglecting this source of variation can give misleading results. We analysed DNA sequences sampled from populations of the marine mussels, Mytilus edulis and M. galloprovincialis, across a well-studied mosaic hybrid zone in Europe and evaluated various scenarios of speciation, with or without variation in introgression rates, using an Approximate Bayesian Computation (ABC) approach. Models with heterogeneous gene flow across loci always outperformed models assuming equal migration rates irrespective of the history of gene flow being considered. By incorporating this heterogeneity, the best-supported scenario was a long period of allopatric isolation during the first three-quarters of the time since divergence followed by secondary contact and introgression during the last quarter. By contrast, constraining migration to be homogeneous failed to discriminate among any of the different models of gene flow tested. Our simulations thus provide statistical support for the secondary contact scenario in the European Mytilus hybrid zone that the standard coalescent approach failed to confirm. Our results demonstrate that genomic variation in introgression rates can have profound impacts on the biological conclusions drawn from inference methods and needs to be incorporated in future studies.
Resumo:
Genome-wide scans of genetic differentiation between hybridizing taxa can identify genome regions with unusual rates of introgression. Regions of high differentiation might represent barriers to gene flow, while regions of low differentiation might indicate adaptive introgression-the spread of selectively beneficial alleles between reproductively isolated genetic backgrounds. Here we conduct a scan for unusual patterns of differentiation in a mosaic hybrid zone between two mussel species, Mytilus edulis and M. galloprovincialis. One outlying locus, mac-1, showed a characteristic footprint of local introgression, with abnormally high frequency of edulis-derived alleles in a patch of M. galloprovincialis enclosed within the mosaic zone, but low frequencies outside of the zone. Further analysis of DNA sequences showed that almost all of the edulis allelic diversity had introgressed into the M. galloprovincialis background in this patch. We then used a variety of approaches to test the hypothesis that there had been adaptive introgression at mac-1. Simulations and model fitting with maximum-likelihood and approximate Bayesian computation approaches suggested that adaptive introgression could generate a "soft sweep," which was qualitatively consistent with our data. Although the migration rate required was high, it was compatible with the functioning of an effective barrier to gene flow as revealed by demographic inferences. As such, adaptive introgression could explain both the reduced intraspecific differentiation around mac-1 and the high diversity of introgressed alleles, although a localized change in barrier strength may also be invoked. Together, our results emphasize the need to account for the complex history of secondary contacts in interpreting outlier loci.
Resumo:
Whether or not species participating in specialized and obligate interactions display similar and simultaneous demographic variations at the intraspecific level remains an open question in phylogeography. In the present study, we used the mutualistic nursery pollination occurring between the European globeflower Trollius europaeus and its specialized pollinators in the genus Chiastocheta as a case study. Explicitly, we investigated if the phylogeographies of the pollinating flies are significantly different from the expectation under a scenario of plant-insect congruence. Based on a large-scale sampling, we first used mitochondrial data to infer the phylogeographical histories of each fly species. Then, we defined phylogeographical scenarios of congruence with the plant history, and used maximum likelihood and Bayesian approaches to test for plant-insect phylogeographical congruence for the three Chiastocheta species. We show that the phylogeographical histories of the three fly species differ. Only Chiastocheta lophota and Chiastocheta dentifera display strong spatial genetic structures, which do not appear to be statistically different from those expected under scenarios of phylogeographical congruence with the plant. The results of the present study indicate that the fly species responded in independent and different ways to shared evolutionary forces, displaying varying levels of congruence with the plant genetic structure
Resumo:
Gradients of variation-or clines-have always intrigued biologists. Classically, they have been interpreted as the outcomes of antagonistic interactions between selection and gene flow. Alternatively, clines may also establish neutrally with isolation by distance (IBD) or secondary contact between previously isolated populations. The relative importance of natural selection and these two neutral processes in the establishment of clinal variation can be tested by comparing genetic differentiation at neutral genetic markers and at the studied trait. A third neutral process, surfing of a newly arisen mutation during the colonization of a new habitat, is more difficult to test. Here, we designed a spatially explicit approximate Bayesian computation (ABC) simulation framework to evaluate whether the strong cline in the genetically based reddish coloration observed in the European barn owl (Tyto alba) arose as a by-product of a range expansion or whether selection has to be invoked to explain this colour cline, for which we have previously ruled out the actions of IBD or secondary contact. Using ABC simulations and genetic data on 390 individuals from 20 locations genotyped at 22 microsatellites loci, we first determined how barn owls colonized Europe after the last glaciation. Using these results in new simulations on the evolution of the colour phenotype, and assuming various genetic architectures for the colour trait, we demonstrate that the observed colour cline cannot be due to the surfing of a neutral mutation. Taking advantage of spatially explicit ABC, which proved to be a powerful method to disentangle the respective roles of selection and drift in range expansions, we conclude that the formation of the colour cline observed in the barn owl must be due to natural selection.
Resumo:
Sex chromosomes are expected to evolve suppressed recombination, which leads to degeneration of the Y and heteromorphism between the X and Y. Some sex chromosomes remain homomorphic, however, and the factors that prevent degeneration of the Y in these cases are not well understood. The homomorphic sex chromosomes of the European tree frogs (Hyla spp.) present an interesting paradox. Recombination in males has never been observed in crossing experiments, but molecular data are suggestive of occasional recombination between the X and Y. The hypothesis that these sex chromosomes recombine has not been tested statistically, however, nor has the X-Y recombination rate been estimated. Here, we use approximate Bayesian computation coupled with coalescent simulations of sex chromosomes to quantify X-Y recombination rate from existent data. We find that microsatellite data from H. arborea, H. intermedia and H. molleri support a recombination rate between X and Y that is significantly different from zero. We estimate that rate to be approximately 10(5) times smaller than that between X chromosomes. Our findings support the notion that very low recombination rate may be sufficient to maintain homomorphism in sex chromosomes.