67 resultados para Markov Chain Monte Carlo
Resumo:
Analyses of high-density single-nucleotide polymorphism (SNP) data, such as genetic mapping and linkage disequilibrium (LD) studies, require phase-known haplotypes to allow for the correlation between tightly linked loci. However, current SNP genotyping technology cannot determine phase, which must be inferred statistically. In this paper, we present a new Bayesian Markov chain Monte Carlo (MCMC) algorithm for population haplotype frequency estimation, particulary in the context of LD assessment. The novel feature of the method is the incorporation of a log-linear prior model for population haplotype frequencies. We present simulations to suggest that 1) the log-linear prior model is more appropriate than the standard coalescent process in the presence of recombination (>0.02cM between adjacent loci), and 2) there is substantial inflation in measures of LD obtained by a "two-stage" approach to the analysis by treating the "best" haplotype configuration as correct, without regard to uncertainty in the recombination process. Genet Epidemiol 25:106-114, 2003. (C) 2003 Wiley-Liss, Inc.
Resumo:
Population subdivision complicates analysis of molecular variation. Even if neutrality is assumed, three evolutionary forces need to be considered: migration, mutation, and drift. Simplification can be achieved by assuming that the process of migration among and drift within subpopulations is occurring fast compared to Mutation and drift in the entire population. This allows a two-step approach in the analysis: (i) analysis of population subdivision and (ii) analysis of molecular variation in the migrant pool. We model population subdivision using an infinite island model, where we allow the migration/drift parameter Theta to vary among populations. Thus, central and peripheral populations can be differentiated. For inference of Theta, we use a coalescence approach, implemented via a Markov chain Monte Carlo (MCMC) integration method that allows estimation of allele frequencies in the migrant pool. The second step of this approach (analysis of molecular variation in the migrant pool) uses the estimated allele frequencies in the migrant pool for the study of molecular variation. We apply this method to a Drosophila ananassae sequence data set. We find little indication of isolation by distance, but large differences in the migration parameter among populations. The population as a whole seems to be expanding. A population from Bogor (Java, Indonesia) shows the highest variation and seems closest to the species center.
Resumo:
We describe a Bayesian approach to analyzing multilocus genotype or haplotype data to assess departures from gametic (linkage) equilibrium. Our approach employs a Markov chain Monte Carlo (MCMC) algorithm to approximate the posterior probability distributions of disequilibrium parameters. The distributions are computed exactly in some simple settings. Among other advantages, posterior distributions can be presented visually, which allows the uncertainties in parameter estimates to be readily assessed. In addition, background knowledge can be incorporated, where available, to improve the precision of inferences. The method is illustrated by application to previously published datasets; implications for multilocus forensic match probabilities and for simple association-based gene mapping are also discussed.
Resumo:
Statistical methods of inference typically require the likelihood function to be computable in a reasonable amount of time. The class of “likelihood-free” methods termed Approximate Bayesian Computation (ABC) is able to eliminate this requirement, replacing the evaluation of the likelihood with simulation from it. Likelihood-free methods have gained in efficiency and popularity in the past few years, following their integration with Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) in order to better explore the parameter space. They have been applied primarily to estimating the parameters of a given model, but can also be used to compare models. Here we present novel likelihood-free approaches to model comparison, based upon the independent estimation of the evidence of each model under study. Key advantages of these approaches over previous techniques are that they allow the exploitation of MCMC or SMC algorithms for exploring the parameter space, and that they do not require a sampler able to mix between models. We validate the proposed methods using a simple exponential family problem before providing a realistic problem from human population genetics: the comparison of different demographic models based upon genetic data from the Y chromosome.
Resumo:
We present, pedagogically, the Bayesian approach to composed error models under alternative, hierarchical characterizations; demonstrate, briefly, the Bayesian approach to model comparison using recent advances in Markov Chain Monte Carlo (MCMC) methods; and illustrate, empirically, the value of these techniques to natural resource economics and coastal fisheries management, in particular. The Bayesian approach to fisheries efficiency analysis is interesting for at least three reasons. First, it is a robust and highly flexible alternative to commonly applied, frequentist procedures, which dominate the literature. Second,the Bayesian approach is extremely simple to implement, requiring only a modest addition to most natural-resource economist tool-kits. Third, despite its attractions, applications of Bayesian methodology in coastal fisheries management are few.
Resumo:
The steadily accumulating literature on technical efficiency in fisheries attests to the importance of efficiency as an indicator of fleet condition and as an object of management concern. In this paper, we extend previous work by presenting a Bayesian hierarchical approach that yields both efficiency estimates and, as a byproduct of the estimation algorithm, probabilistic rankings of the relative technical efficiencies of fishing boats. The estimation algorithm is based on recent advances in Markov Chain Monte Carlo (MCMC) methods— Gibbs sampling, in particular—which have not been widely used in fisheries economics. We apply the method to a sample of 10,865 boat trips in the US Pacific hake (or whiting) fishery during 1987–2003. We uncover systematic differences between efficiency rankings based on sample mean efficiency estimates and those that exploit the full posterior distributions of boat efficiencies to estimate the probability that a given boat has the highest true mean efficiency.
Resumo:
The Homeric epics are among the greatest masterpieces of literature, but when they were produced is not known with certainty. Here we apply evolutionary-linguistic phylogenetic statistical methods to differences in Homeric, Modern Greek and ancient Hittite vocabulary items to estimate a date of approximately 710–760 BCE for these great works. Our analysis compared a common set of vocabulary items among the three pairs of languages, recording for each item whether the words in the two languages were cognate – derived from a shared ancestral word – or not. We then used a likelihood-based Markov chain Monte Carlo procedure to estimate the most probable times in years separating these languages given the percentage of words they shared, combined with knowledge of the rates at which different words change. Our date for the epics is in close agreement with historians' and classicists' beliefs derived from historical and archaeological sources.
Resumo:
Bayesian analysis is given of an instrumental variable model that allows for heteroscedasticity in both the structural equation and the instrument equation. Specifically, the approach for dealing with heteroscedastic errors in Geweke (1993) is extended to the Bayesian instrumental variable estimator outlined in Rossi et al. (2005). Heteroscedasticity is treated by modelling the variance for each error using a hierarchical prior that is Gamma distributed. The computation is carried out by using a Markov chain Monte Carlo sampling algorithm with an augmented draw for the heteroscedastic case. An example using real data illustrates the approach and shows that ignoring heteroscedasticity in the instrument equation when it exists may lead to biased estimates.
Resumo:
The political economy literature on agriculture emphasizes influence over political outcomes via lobbying conduits in general, political action committee contributions in particular and the pervasive view that political preferences with respect to agricultural issues are inherently geographic. In this context, ‘interdependence’ in Congressional vote behaviour manifests itself in two dimensions. One dimension is the intensity by which neighboring vote propensities influence one another and the second is the geographic extent of voter influence. We estimate these facets of dependence using data on a Congressional vote on the 2001 Farm Bill using routine Markov chain Monte Carlo procedures and Bayesian model averaging, in particular. In so doing, we develop a novel procedure to examine both the reliability and the consequences of different model representations for measuring both the ‘scale’ and the ‘scope’ of spatial (geographic) co-relations in voting behaviour.
Resumo:
We present an analysis of seven primary transit observations of the hot Neptune GJ436b at 3.6, 4.5, and 8 μm obtained with the Infrared Array Camera on the Spitzer Space Telescope. After correcting for systematic effects, we fitted the light curves using the Markov Chain Monte Carlo technique. Combining these new data with the EPOXI, Hubble Space Telescope, and ground-based V, I, H, and Ks published observations, the range 0.5-10 μm can be covered. Due to the low level of activity of GJ436, the effect of starspots on the combination of transits at different epochs is negligible at the accuracy of the data set. Representative climate models were calculated by using a three-dimensional, pseudospectral general circulation model with idealized thermal forcing. Simulated transit spectra of GJ436b were generated using line-by-line radiative transfer models including the opacities of the molecular species expected to be present in such a planetary atmosphere. A new, ab-initio-calculated, line list for hot ammonia has been used for the first time. The photometric data observed at multiple wavelengths can be interpreted with methane being the dominant absorption after molecular hydrogen, possibly with minor contributions from ammonia, water, and other molecules. No clear evidence of carbon monoxide and carbon dioxide is found from transit photometry. We discuss this result in the light of a recent paper where photochemical disequilibrium is hypothesized to interpret secondary transit photometric data. We show that the emission photometric data are not incompatible with the presence of abundant methane, but further spectroscopic data are desirable to confirm this scenario.
Resumo:
In this paper, we study jumps in commodity prices. Unlike assumed in existing models of commodity price dynamics, a simple analysis of the data reveals that the probability of tail events is not constant but depends on the time of the year, i.e. exhibits seasonality. We propose a stochastic volatility jump–diffusion model to capture this seasonal variation. Applying the Markov Chain Monte Carlo (MCMC) methodology, we estimate our model using 20 years of futures data from four different commodity markets. We find strong statistical evidence to suggest that our model with seasonal jump intensity outperforms models featuring a constant jump intensity. To demonstrate the practical relevance of our findings, we show that our model typically improves Value-at-Risk (VaR) forecasts.
Resumo:
Monte Carlo algorithms often aim to draw from a distribution π by simulating a Markov chain with transition kernel P such that π is invariant under P. However, there are many situations for which it is impractical or impossible to draw from the transition kernel P. For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace P by an approximation Pˆ. Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how ’close’ the chain given by the transition kernel Pˆ is to the chain given by P . We apply these results to several examples from spatial statistics and network analysis.
Resumo:
In this work we study the computational complexity of a class of grid Monte Carlo algorithms for integral equations. The idea of the algorithms consists in an approximation of the integral equation by a system of algebraic equations. Then the Markov chain iterative Monte Carlo is used to solve the system. The assumption here is that the corresponding Neumann series for the iterative matrix does not necessarily converge or converges slowly. We use a special technique to accelerate the convergence. An estimate of the computational complexity of Monte Carlo algorithm using the considered approach is obtained. The estimate of the complexity is compared with the corresponding quantity for the complexity of the grid-free Monte Carlo algorithm. The conditions under which the class of grid Monte Carlo algorithms is more efficient are given.
Resumo:
A partial phase diagram is constructed for diblock copolymer melts using lattice-based Monte Carlo simulations. This is done by locating the order-disorder transition (ODT) with the aid of a recently proposed order parameter and identifying the ordered phase over a wide range of copolymer compositions (0.2 <= f <= 0.8). Consistent with experiments, the disordered phase is found to exhibit direct first-order transitions to each of the ordered morphologies. This includes the spontaneous formation of a perforated-lamellar phase, which presumably forms in place of the gyroid morphology due to finite-size and/or nonequilibrium effects. Also included in our study is a detailed examination of disordered cylinder-forming (f=0.3) diblock copolymers, revealing a substantial degree of pretransitional chain stretching and short-range order that set in well before the ODT, as observed previously in analogous studies on lamellar-forming (f=0.5) molecules. (c) 2006 American Institute of Physics.
Resumo:
In this paper we consider hybrid (fast stochastic approximation and deterministic refinement) algorithms for Matrix Inversion (MI) and Solving Systems of Linear Equations (SLAE). Monte Carlo methods are used for the stochastic approximation, since it is known that they are very efficient in finding a quick rough approximation of the element or a row of the inverse matrix or finding a component of the solution vector. We show how the stochastic approximation of the MI can be combined with a deterministic refinement procedure to obtain MI with the required precision and further solve the SLAE using MI. We employ a splitting A = D – C of a given non-singular matrix A, where D is a diagonal dominant matrix and matrix C is a diagonal matrix. In our algorithm for solving SLAE and MI different choices of D can be considered in order to control the norm of matrix T = D –1C, of the resulting SLAE and to minimize the number of the Markov Chains required to reach given precision. Further we run the algorithms on a mini-Grid and investigate their efficiency depending on the granularity. Corresponding experimental results are presented.