77 resultados para markov chains monte carlo methods
Resumo:
We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate- variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.
Resumo:
This article introduces a new general method for genealogical inference that samples independent genealogical histories using importance sampling (IS) and then samples other parameters with Markov chain Monte Carlo (MCMC). It is then possible to more easily utilize the advantages of importance sampling in a fully Bayesian framework. The method is applied to the problem of estimating recent changes in effective population size from temporally spaced gene frequency data. The method gives the posterior distribution of effective population size at the time of the oldest sample and at the time of the most recent sample, assuming a model of exponential growth or decline during the interval. The effect of changes in number of alleles, number of loci, and sample size on the accuracy of the method is described using test simulations, and it is concluded that these have an approximately equivalent effect. The method is used on three example data sets and problems in interpreting the posterior densities are highlighted and discussed.
Resumo:
Analyses of high-density single-nucleotide polymorphism (SNP) data, such as genetic mapping and linkage disequilibrium (LD) studies, require phase-known haplotypes to allow for the correlation between tightly linked loci. However, current SNP genotyping technology cannot determine phase, which must be inferred statistically. In this paper, we present a new Bayesian Markov chain Monte Carlo (MCMC) algorithm for population haplotype frequency estimation, particulary in the context of LD assessment. The novel feature of the method is the incorporation of a log-linear prior model for population haplotype frequencies. We present simulations to suggest that 1) the log-linear prior model is more appropriate than the standard coalescent process in the presence of recombination (>0.02cM between adjacent loci), and 2) there is substantial inflation in measures of LD obtained by a "two-stage" approach to the analysis by treating the "best" haplotype configuration as correct, without regard to uncertainty in the recombination process. Genet Epidemiol 25:106-114, 2003. (C) 2003 Wiley-Liss, Inc.
Resumo:
Population subdivision complicates analysis of molecular variation. Even if neutrality is assumed, three evolutionary forces need to be considered: migration, mutation, and drift. Simplification can be achieved by assuming that the process of migration among and drift within subpopulations is occurring fast compared to Mutation and drift in the entire population. This allows a two-step approach in the analysis: (i) analysis of population subdivision and (ii) analysis of molecular variation in the migrant pool. We model population subdivision using an infinite island model, where we allow the migration/drift parameter Theta to vary among populations. Thus, central and peripheral populations can be differentiated. For inference of Theta, we use a coalescence approach, implemented via a Markov chain Monte Carlo (MCMC) integration method that allows estimation of allele frequencies in the migrant pool. The second step of this approach (analysis of molecular variation in the migrant pool) uses the estimated allele frequencies in the migrant pool for the study of molecular variation. We apply this method to a Drosophila ananassae sequence data set. We find little indication of isolation by distance, but large differences in the migration parameter among populations. The population as a whole seems to be expanding. A population from Bogor (Java, Indonesia) shows the highest variation and seems closest to the species center.
Resumo:
The Boltzmann equation in presence of boundary and initial conditions, which describes the general case of carrier transport in microelectronic devices is analysed in terms of Monte Carlo theory. The classical Ensemble Monte Carlo algorithm which has been devised by merely phenomenological considerations of the initial and boundary carrier contributions is now derived in a formal way. The approach allows to suggest a set of event-biasing algorithms for statistical enhancement as an alternative of the population control technique, which is virtually the only algorithm currently used in particle simulators. The scheme of the self-consistent coupling of Boltzmann and Poisson equation is considered for the case of weighted particles. It is shown that particles survive the successive iteration steps.
Resumo:
We describe a Bayesian approach to analyzing multilocus genotype or haplotype data to assess departures from gametic (linkage) equilibrium. Our approach employs a Markov chain Monte Carlo (MCMC) algorithm to approximate the posterior probability distributions of disequilibrium parameters. The distributions are computed exactly in some simple settings. Among other advantages, posterior distributions can be presented visually, which allows the uncertainties in parameter estimates to be readily assessed. In addition, background knowledge can be incorporated, where available, to improve the precision of inferences. The method is illustrated by application to previously published datasets; implications for multilocus forensic match probabilities and for simple association-based gene mapping are also discussed.
Resumo:
This paper introduces a method for simulating multivariate samples that have exact means, covariances, skewness and kurtosis. We introduce a new class of rectangular orthogonal matrix which is fundamental to the methodology and we call these matrices L matrices. They may be deterministic, parametric or data specific in nature. The target moments determine the L matrix then infinitely many random samples with the same exact moments may be generated by multiplying the L matrix by arbitrary random orthogonal matrices. This methodology is thus termed “ROM simulation”. Considering certain elementary types of random orthogonal matrices we demonstrate that they generate samples with different characteristics. ROM simulation has applications to many problems that are resolved using standard Monte Carlo methods. But no parametric assumptions are required (unless parametric L matrices are used) so there is no sampling error caused by the discrete approximation of a continuous distribution, which is a major source of error in standard Monte Carlo simulations. For illustration, we apply ROM simulation to determine the value-at-risk of a stock portfolio.
Resumo:
We exploit a theory of price linkages that lends itself readily to empirical examination using Markovchain, Monte Carlo methods. The methodology facilitates classification and discrimination among alternative regimes in economic time series. The theory and procedures are applied to annual series (1955-1992) on the U.S. beef sector
Resumo:
Many applications, such as intermittent data assimilation, lead to a recursive application of Bayesian inference within a Monte Carlo context. Popular data assimilation algorithms include sequential Monte Carlo methods and ensemble Kalman filters (EnKFs). These methods differ in the way Bayesian inference is implemented. Sequential Monte Carlo methods rely on importance sampling combined with a resampling step, while EnKFs utilize a linear transformation of Monte Carlo samples based on the classic Kalman filter. While EnKFs have proven to be quite robust even for small ensemble sizes, they are not consistent since their derivation relies on a linear regression ansatz. In this paper, we propose another transform method, which does not rely on any a priori assumptions on the underlying prior and posterior distributions. The new method is based on solving an optimal transportation problem for discrete random variables. © 2013, Society for Industrial and Applied Mathematics
Resumo:
Bayesian analysis is given of an instrumental variable model that allows for heteroscedasticity in both the structural equation and the instrument equation. Specifically, the approach for dealing with heteroscedastic errors in Geweke (1993) is extended to the Bayesian instrumental variable estimator outlined in Rossi et al. (2005). Heteroscedasticity is treated by modelling the variance for each error using a hierarchical prior that is Gamma distributed. The computation is carried out by using a Markov chain Monte Carlo sampling algorithm with an augmented draw for the heteroscedastic case. An example using real data illustrates the approach and shows that ignoring heteroscedasticity in the instrument equation when it exists may lead to biased estimates.
Resumo:
The political economy literature on agriculture emphasizes influence over political outcomes via lobbying conduits in general, political action committee contributions in particular and the pervasive view that political preferences with respect to agricultural issues are inherently geographic. In this context, ‘interdependence’ in Congressional vote behaviour manifests itself in two dimensions. One dimension is the intensity by which neighboring vote propensities influence one another and the second is the geographic extent of voter influence. We estimate these facets of dependence using data on a Congressional vote on the 2001 Farm Bill using routine Markov chain Monte Carlo procedures and Bayesian model averaging, in particular. In so doing, we develop a novel procedure to examine both the reliability and the consequences of different model representations for measuring both the ‘scale’ and the ‘scope’ of spatial (geographic) co-relations in voting behaviour.
Resumo:
We present an analysis of seven primary transit observations of the hot Neptune GJ436b at 3.6, 4.5, and 8 μm obtained with the Infrared Array Camera on the Spitzer Space Telescope. After correcting for systematic effects, we fitted the light curves using the Markov Chain Monte Carlo technique. Combining these new data with the EPOXI, Hubble Space Telescope, and ground-based V, I, H, and Ks published observations, the range 0.5-10 μm can be covered. Due to the low level of activity of GJ436, the effect of starspots on the combination of transits at different epochs is negligible at the accuracy of the data set. Representative climate models were calculated by using a three-dimensional, pseudospectral general circulation model with idealized thermal forcing. Simulated transit spectra of GJ436b were generated using line-by-line radiative transfer models including the opacities of the molecular species expected to be present in such a planetary atmosphere. A new, ab-initio-calculated, line list for hot ammonia has been used for the first time. The photometric data observed at multiple wavelengths can be interpreted with methane being the dominant absorption after molecular hydrogen, possibly with minor contributions from ammonia, water, and other molecules. No clear evidence of carbon monoxide and carbon dioxide is found from transit photometry. We discuss this result in the light of a recent paper where photochemical disequilibrium is hypothesized to interpret secondary transit photometric data. We show that the emission photometric data are not incompatible with the presence of abundant methane, but further spectroscopic data are desirable to confirm this scenario.
Resumo:
In this paper, we study jumps in commodity prices. Unlike assumed in existing models of commodity price dynamics, a simple analysis of the data reveals that the probability of tail events is not constant but depends on the time of the year, i.e. exhibits seasonality. We propose a stochastic volatility jump–diffusion model to capture this seasonal variation. Applying the Markov Chain Monte Carlo (MCMC) methodology, we estimate our model using 20 years of futures data from four different commodity markets. We find strong statistical evidence to suggest that our model with seasonal jump intensity outperforms models featuring a constant jump intensity. To demonstrate the practical relevance of our findings, we show that our model typically improves Value-at-Risk (VaR) forecasts.
Resumo:
Nonlinear adjustment toward long-run price equilibrium relationships in the sugar-ethanol-oil nexus in Brazil is examined. We develop generalized bivariate error correction models that allow for cointegration between sugar, ethanol, and oil prices, where dynamic adjustments are potentially nonlinear functions of the disequilibrium errors. A range of models are estimated using Bayesian Monte Carlo Markov Chain algorithms and compared using Bayesian model selection methods. The results suggest that the long-run drivers of Brazilian sugar prices are oil prices and that there are nonlinearities in the adjustment processes of sugar and ethanol prices to oil price but linear adjustment between ethanol and sugar prices.
Resumo:
We have estimated the speed and direction of propagation of a number of Coronal Mass Ejections (CMEs) using single-spacecraft data from the STEREO Heliospheric Imager (HI) wide-field cameras. In general, these values are in good agreement with those predicted by Thernisien, Vourlidas, and Howard in Solar Phys. 256, 111 -aEuro parts per thousand 130 (2009) using a forward modelling method to fit CMEs imaged by the STEREO COR2 coronagraphs. The directions of the CMEs predicted by both techniques are in good agreement despite the fact that many of the CMEs under study travel in directions that cause them to fade rapidly in the HI images. The velocities estimated from both techniques are in general agreement although there are some interesting differences that may provide evidence for the influence of the ambient solar wind on the speed of CMEs. The majority of CMEs with a velocity estimated to be below 400 km s(-1) in the COR2 field of view have higher estimated velocities in the HI field of view, while, conversely, those with COR2 velocities estimated to be above 400 km s(-1) have lower estimated HI velocities. We interpret this as evidence for the deceleration of fast CMEs and the acceleration of slower CMEs by interaction with the ambient solar wind beyond the COR2 field of view. We also show that the uncertainties in our derived parameters are influenced by the range of elongations over which each CME can be tracked. In order to reduce the uncertainty in the predicted arrival time of a CME at 1 Astronomical Unit (AU) to within six hours, the CME needs to be tracked out to at least 30 degrees elongation. This is in good agreement with predictions of the accuracy of our technique based on Monte Carlo simulations.