71 resultados para Método de Monte Carlo via cadeias de Markov
Resumo:
The Monte Carlo Independent Column Approximation (McICA) is a flexible method for representing subgrid-scale cloud inhomogeneity in radiative transfer schemes. It does, however, introduce conditional random errors but these have been shown to have little effect on climate simulations, where spatial and temporal scales of interest are large enough for effects of noise to be averaged out. This article considers the effect of McICA noise on a numerical weather prediction (NWP) model, where the time and spatial scales of interest are much closer to those at which the errors manifest themselves; this, as we show, means that noise is more significant. We suggest methods for efficiently reducing the magnitude of McICA noise and test these methods in a global NWP version of the UK Met Office Unified Model (MetUM). The resultant errors are put into context by comparison with errors due to the widely used assumption of maximum-random-overlap of plane-parallel homogeneous cloud. For a simple implementation of the McICA scheme, forecasts of near-surface temperature are found to be worse than those obtained using the plane-parallel, maximum-random-overlap representation of clouds. However, by applying the methods suggested in this article, we can reduce noise enough to give forecasts of near-surface temperature that are an improvement on the plane-parallel maximum-random-overlap forecasts. We conclude that the McICA scheme can be used to improve the representation of clouds in NWP models, with the provision that the associated noise is sufficiently small.
Resumo:
Analyses of high-density single-nucleotide polymorphism (SNP) data, such as genetic mapping and linkage disequilibrium (LD) studies, require phase-known haplotypes to allow for the correlation between tightly linked loci. However, current SNP genotyping technology cannot determine phase, which must be inferred statistically. In this paper, we present a new Bayesian Markov chain Monte Carlo (MCMC) algorithm for population haplotype frequency estimation, particulary in the context of LD assessment. The novel feature of the method is the incorporation of a log-linear prior model for population haplotype frequencies. We present simulations to suggest that 1) the log-linear prior model is more appropriate than the standard coalescent process in the presence of recombination (>0.02cM between adjacent loci), and 2) there is substantial inflation in measures of LD obtained by a "two-stage" approach to the analysis by treating the "best" haplotype configuration as correct, without regard to uncertainty in the recombination process. Genet Epidemiol 25:106-114, 2003. (C) 2003 Wiley-Liss, Inc.
Resumo:
The identification of signatures of natural selection in genomic surveys has become an area of intense research, stimulated by the increasing ease with which genetic markers can be typed. Loci identified as subject to selection may be functionally important, and hence (weak) candidates for involvement in disease causation. They can also be useful in determining the adaptive differentiation of populations, and exploring hypotheses about speciation. Adaptive differentiation has traditionally been identified from differences in allele frequencies among different populations, summarised by an estimate of F-ST. Low outliers relative to an appropriate neutral population-genetics model indicate loci subject to balancing selection, whereas high outliers suggest adaptive (directional) selection. However, the problem of identifying statistically significant departures from neutrality is complicated by confounding effects on the distribution of F-ST estimates, and current methods have not yet been tested in large-scale simulation experiments. Here, we simulate data from a structured population at many unlinked, diallelic loci that are predominantly neutral but with some loci subject to adaptive or balancing selection. We develop a hierarchical-Bayesian method, implemented via Markov chain Monte Carlo (MCMC), and assess its performance in distinguishing the loci simulated under selection from the neutral loci. We also compare this performance with that of a frequentist method, based on moment-based estimates of F-ST. We find that both methods can identify loci subject to adaptive selection when the selection coefficient is at least five times the migration rate. Neither method could reliably distinguish loci under balancing selection in our simulations, even when the selection coefficient is twenty times the migration rate.
Resumo:
Population subdivision complicates analysis of molecular variation. Even if neutrality is assumed, three evolutionary forces need to be considered: migration, mutation, and drift. Simplification can be achieved by assuming that the process of migration among and drift within subpopulations is occurring fast compared to Mutation and drift in the entire population. This allows a two-step approach in the analysis: (i) analysis of population subdivision and (ii) analysis of molecular variation in the migrant pool. We model population subdivision using an infinite island model, where we allow the migration/drift parameter Theta to vary among populations. Thus, central and peripheral populations can be differentiated. For inference of Theta, we use a coalescence approach, implemented via a Markov chain Monte Carlo (MCMC) integration method that allows estimation of allele frequencies in the migrant pool. The second step of this approach (analysis of molecular variation in the migrant pool) uses the estimated allele frequencies in the migrant pool for the study of molecular variation. We apply this method to a Drosophila ananassae sequence data set. We find little indication of isolation by distance, but large differences in the migration parameter among populations. The population as a whole seems to be expanding. A population from Bogor (Java, Indonesia) shows the highest variation and seems closest to the species center.
Resumo:
Undirected graphical models are widely used in statistics, physics and machine vision. However Bayesian parameter estimation for undirected models is extremely challenging, since evaluation of the posterior typically involves the calculation of an intractable normalising constant. This problem has received much attention, but very little of this has focussed on the important practical case where the data consists of noisy or incomplete observations of the underlying hidden structure. This paper specifically addresses this problem, comparing two alternative methodologies. In the first of these approaches particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently explore the parameter space, combined with the exchange algorithm (Murray et al., 2006) for avoiding the calculation of the intractable normalising constant (a proof showing that this combination targets the correct distribution in found in a supplementary appendix online). This approach is compared with approximate Bayesian computation (Pritchard et al., 1999). Applications to estimating the parameters of Ising models and exponential random graphs from noisy data are presented. Each algorithm used in the paper targets an approximation to the true posterior due to the use of MCMC to simulate from the latent graphical model, in lieu of being able to do this exactly in general. The supplementary appendix also describes the nature of the resulting approximation.
Resumo:
The political economy literature on agriculture emphasizes influence over political outcomes via lobbying conduits in general, political action committee contributions in particular and the pervasive view that political preferences with respect to agricultural issues are inherently geographic. In this context, ‘interdependence’ in Congressional vote behaviour manifests itself in two dimensions. One dimension is the intensity by which neighboring vote propensities influence one another and the second is the geographic extent of voter influence. We estimate these facets of dependence using data on a Congressional vote on the 2001 Farm Bill using routine Markov chain Monte Carlo procedures and Bayesian model averaging, in particular. In so doing, we develop a novel procedure to examine both the reliability and the consequences of different model representations for measuring both the ‘scale’ and the ‘scope’ of spatial (geographic) co-relations in voting behaviour.
Resumo:
Across Europe, elevated phosphorus (P) concentrations in lowland rivers have made them particularly susceptible to eutrophication. This is compounded in southern and central UK by increasing pressures on water resources, which may be further enhanced by the potential effects of climate change. The EU Water Framework Directive requires an integrated approach to water resources management at the catchment scale and highlights the need for modelling tools that can distinguish relative contributions from multiple nutrient sources and are consistent with the information content of the available data. Two such models are introduced and evaluated within a stochastic framework using daily flow and total phosphorus concentrations recorded in a clay catchment typical of many areas of the lowland UK. Both models disaggregate empirical annual load estimates, derived from land use data, as a function of surface/near surface runoff, generated using a simple conceptual rainfall-runoff model. Estimates of the daily load from agricultural land, together with those from baseflow and point sources, feed into an in-stream routing algorithm. The first model assumes constant concentrations in runoff via surface/near surface pathways and incorporates an additional P store in the river-bed sediments, depleted above a critical discharge, to explicitly simulate resuspension. The second model, which is simpler, simulates P concentrations as a function of surface/near surface runoff, thus emphasising the influence of non-point source loads during flow peaks and mixing of baseflow and point sources during low flows. The temporal consistency of parameter estimates and thus the suitability of each approach is assessed dynamically following a new approach based on Monte-Carlo analysis. (c) 2004 Elsevier B.V. All rights reserved.
Resumo:
Nonlinear adjustment toward long-run price equilibrium relationships in the sugar-ethanol-oil nexus in Brazil is examined. We develop generalized bivariate error correction models that allow for cointegration between sugar, ethanol, and oil prices, where dynamic adjustments are potentially nonlinear functions of the disequilibrium errors. A range of models are estimated using Bayesian Monte Carlo Markov Chain algorithms and compared using Bayesian model selection methods. The results suggest that the long-run drivers of Brazilian sugar prices are oil prices and that there are nonlinearities in the adjustment processes of sugar and ethanol prices to oil price but linear adjustment between ethanol and sugar prices.
Resumo:
The steadily accumulating literature on technical efficiency in fisheries attests to the importance of efficiency as an indicator of fleet condition and as an object of management concern. In this paper, we extend previous work by presenting a Bayesian hierarchical approach that yields both efficiency estimates and, as a byproduct of the estimation algorithm, probabilistic rankings of the relative technical efficiencies of fishing boats. The estimation algorithm is based on recent advances in Markov Chain Monte Carlo (MCMC) methods—Gibbs sampling, in particular—which have not been widely used in fisheries economics. We apply the method to a sample of 10,865 boat trips in the US Pacific hake (or whiting) fishery during 1987–2003. We uncover systematic differences between efficiency rankings based on sample mean efficiency estimates and those that exploit the full posterior distributions of boat efficiencies to estimate the probability that a given boat has the highest true mean efficiency.
Resumo:
The evolutionary history of gains and losses of vegetative reproductive propagules (soredia) in Porpidia s.l., a group of lichen-forming ascomycetes, was clarified using Bayesian Markov chain Monte Carlo (MCMC) approaches to monophyly tests and a combined MCMC and maximum likelihood approach to ancestral character state reconstructions. The MCMC framework provided confidence estimates for the reconstructions of relationships and ancestral character states, which formed the basis for tests of evolutionary hypotheses. Monophyly tests rejected all hypotheses that predicted any clustering of reproductive modes in extant taxa. In addition, a nearest-neighbor statistic could not reject the hypothesis that the vegetative reproductive mode is randomly distributed throughout the group. These results show that transitions between presence and absence of the vegetative reproductive mode within Porpidia s.l. occurred several times and independently of each other. Likelihood reconstructions of ancestral character states at selected nodes suggest that - contrary to previous thought - the ancestor to Porpidia s.l. already possessed the vegetative reproductive mode. Furthermore, transition rates are reconstructed asymmetrically with the vegetative reproductive mode being gained at a much lower rate than it is lost. A cautious note has to be added, because a simulation study showed that the ancestral character state reconstructions were highly dependent on taxon sampling. However, our central conclusions, particularly the higher rate of change from vegetative reproductive mode present to absent than vice versa within Porpidia s.l., were found to be broadly independent of taxon sampling. [Ancestral character state reconstructions; Ascomycota, Bayesian inference; hypothesis testing; likelihood; MCMC; Porpidia; reproductive systems]
Resumo:
Micromorphological characters of the fruiting bodies, such as ascus-type and hymenial amyloidity, and secondary chemistry have been widely employed as key characters in Ascomycota classification. However, the evolution of these characters has yet not been studied using molecular phylogenies. We have used a combined Bayesian and maximum likelihood based approach to trace character evolution on a tree inferred from a combined analysis of nuclear and mitochondrial ribosomal DNA sequences. The maximum likelihood aspect overcomes simplifications inherent in maximum parsimony methods, whereas the Markov chain Monte Carlo aspect renders results independent of any particular phylogenetic tree. The results indicate that the evolution of the two chemical characters is quite different, being stable once developed for the medullary lecanoric acid, whereas the cortical chlorinated xanthones appear to have been lost several times. The current ascus-types and the amyloidity of the hymenial gel in Pertusariaceae appear to have been developed within the family. The basal ascus-type of pertusarialean fungi remains unknown. (c) 2006 The Linnean Society of London, Biological Journal of the Linnean Society, 2006, 89, 615-626.
Resumo:
Biologists frequently attempt to infer the character states at ancestral nodes of a phylogeny from the distribution of traits observed in contemporary organisms. Because phylogenies are normally inferences from data, it is desirable to account for the uncertainty in estimates of the tree and its branch lengths when making inferences about ancestral states or other comparative parameters. Here we present a general Bayesian approach for testing comparative hypotheses across statistically justified samples of phylogenies, focusing on the specific issue of reconstructing ancestral states. The method uses Markov chain Monte Carlo techniques for sampling phylogenetic trees and for investigating the parameters of a statistical model of trait evolution. We describe how to combine information about the uncertainty of the phylogeny with uncertainty in the estimate of the ancestral state. Our approach does not constrain the sample of trees only to those that contain the ancestral node or nodes of interest, and we show how to reconstruct ancestral states of uncertain nodes using a most-recent-common-ancestor approach. We illustrate the methods with data on ribonuclease evolution in the Artiodactyla. Software implementing the methods ( BayesMultiState) is available from the authors.
Resumo:
We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate- variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.
Resumo:
This article introduces a new general method for genealogical inference that samples independent genealogical histories using importance sampling (IS) and then samples other parameters with Markov chain Monte Carlo (MCMC). It is then possible to more easily utilize the advantages of importance sampling in a fully Bayesian framework. The method is applied to the problem of estimating recent changes in effective population size from temporally spaced gene frequency data. The method gives the posterior distribution of effective population size at the time of the oldest sample and at the time of the most recent sample, assuming a model of exponential growth or decline during the interval. The effect of changes in number of alleles, number of loci, and sample size on the accuracy of the method is described using test simulations, and it is concluded that these have an approximately equivalent effect. The method is used on three example data sets and problems in interpreting the posterior densities are highlighted and discussed.
Resumo:
The Danish Eulerian Model (DEM) is a powerful air pollution model, designed to calculate the concentrations of various dangerous species over a large geographical region (e.g. Europe). It takes into account the main physical and chemical processes between these species, the actual meteorological conditions, emissions, etc.. This is a huge computational task and requires significant resources of storage and CPU time. Parallel computing is essential for the efficient practical use of the model. Some efficient parallel versions of the model were created over the past several years. A suitable parallel version of DEM by using the Message Passing Interface library (AIPI) was implemented on two powerful supercomputers of the EPCC - Edinburgh, available via the HPC-Europa programme for transnational access to research infrastructures in EC: a Sun Fire E15K and an IBM HPCx cluster. Although the implementation is in principal, the same for both supercomputers, few modifications had to be done for successful porting of the code on the IBM HPCx cluster. Performance analysis and parallel optimization was done next. Results from bench marking experiments will be presented in this paper. Another set of experiments was carried out in order to investigate the sensitivity of the model to variation of some chemical rate constants in the chemical submodel. Certain modifications of the code were necessary to be done in accordance with this task. The obtained results will be used for further sensitivity analysis Studies by using Monte Carlo simulation.