68 resultados para Approximate Bayesian computation, Posterior distribution, Quantile distribution, Response time data
Resumo:
Statistical methods of inference typically require the likelihood function to be computable in a reasonable amount of time. The class of “likelihood-free” methods termed Approximate Bayesian Computation (ABC) is able to eliminate this requirement, replacing the evaluation of the likelihood with simulation from it. Likelihood-free methods have gained in efficiency and popularity in the past few years, following their integration with Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) in order to better explore the parameter space. They have been applied primarily to estimating the parameters of a given model, but can also be used to compare models. Here we present novel likelihood-free approaches to model comparison, based upon the independent estimation of the evidence of each model under study. Key advantages of these approaches over previous techniques are that they allow the exploitation of MCMC or SMC algorithms for exploring the parameter space, and that they do not require a sampler able to mix between models. We validate the proposed methods using a simple exponential family problem before providing a realistic problem from human population genetics: the comparison of different demographic models based upon genetic data from the Y chromosome.
Resumo:
We have previously placed the solar contribution to recent global warming in context using observations and without recourse to climate models. It was shown that all solar forcings of climate have declined since 1987. The present paper extends that analysis to include the effects of the various time constants with which the Earth’s climate system might react to solar forcing. The solar input waveform over the past 100 years is defined using observed and inferred galactic cosmic ray fluxes, valid for either a direct effect of cosmic rays on climate or an effect via their known correlation with total solar irradiance (TSI), or for a combination of the two. The implications, and the relative merits, of the various TSI composite data series are discussed and independent tests reveal that the PMOD composite used in our previous paper is the most realistic. Use of the ACRIM composite, which shows a rise in TSI over recent decades, is shown to be inconsistent with most published evidence for solar influences on pre-industrial climate. The conclusions of our previous paper, that solar forcing has declined over the past 20 years while surface air temperatures have continued to rise, are shown to apply for the full range of potential time constants for the climate response to the variations in the solar forcings.
Resumo:
The time scale of the response of the high-latitude dayside ionospheric flow to changes in the North-South component of the interplanetary magnetic field (IMF) has been investigated by examining the time delays between corresponding sudden changes. Approximately 40 h of simultaneous IMF and ionospheric flow data have been examined, obtained by the AMPTE-UKS and -IRM spacecraft and the EISCAT “Polar” experiment, respectively, in which 20 corresponding sudden changes have been identified. Ten of these changes were associated with southward turnings of the IMF, and 10 with northward turnings. It has been found that the corresponding flow changes occurred simultaneously over the whole of the “Polar” field-of-view, extending more than 2° in invariant latitude, and that the ionospheric response delay following northward turnings is the same as that following southward turnings, though the form of the response is different in the two cases. The shortest response time, 5.5 ± 3.2 min, is found in the early- to mid-afternoon sector, increasing to 9.5 ± 3.0 min in the mid-morning sector, and to 9.5 ± 3.1 min near to dusk. These times represent the delays in the appearance of perturbed flows in the “Polar” field-of-view following the arrival of IMF changes at the subsolar magnetopause. Overall, the results agree very well with those derived by Etemadi et al. (1988, Planet. Space Sci.36, 471) from a general cross-correlation analysis of the IMF Bz and “Polar” beam-swinging vector flow data.
Resumo:
Bloom filters are a data structure for storing data in a compressed form. They offer excellent space and time efficiency at the cost of some loss of accuracy (so-called lossy compression). This work presents a yes-no Bloom filter, which as a data structure consisting of two parts: the yes-filter which is a standard Bloom filter and the no-filter which is another Bloom filter whose purpose is to represent those objects that were recognised incorrectly by the yes-filter (that is, to recognise the false positives of the yes-filter). By querying the no-filter after an object has been recognised by the yes-filter, we get a chance of rejecting it, which improves the accuracy of data recognition in comparison with the standard Bloom filter of the same total length. A further increase in accuracy is possible if one chooses objects to include in the no-filter so that the no-filter recognises as many as possible false positives but no true positives, thus producing the most accurate yes-no Bloom filter among all yes-no Bloom filters. This paper studies how optimization techniques can be used to maximize the number of false positives recognised by the no-filter, with the constraint being that it should recognise no true positives. To achieve this aim, an Integer Linear Program (ILP) is proposed for the optimal selection of false positives. In practice the problem size is normally large leading to intractable optimal solution. Considering the similarity of the ILP with the Multidimensional Knapsack Problem, an Approximate Dynamic Programming (ADP) model is developed making use of a reduced ILP for the value function approximation. Numerical results show the ADP model works best comparing with a number of heuristics as well as the CPLEX built-in solver (B&B), and this is what can be recommended for use in yes-no Bloom filters. In a wider context of the study of lossy compression algorithms, our researchis an example showing how the arsenal of optimization methods can be applied to improving the accuracy of compressed data.
Resumo:
Geographic distributions of pathogens are the outcome of dynamic processes involving host availability, susceptibility and abundance, suitability of climate conditions, and historical contingency including evolutionary change. Distributions have changed fast and are changing fast in response to many factors, including climatic change. The response time of arable agriculture is intrinsically fast, but perennial crops and especially forests are unlikely to adapt easily. Predictions of many of the variables needed to predict changes in pathogen range are still rather uncertain, and their effects will be profoundly modified by changes elsewhere in the agricultural system, including both economic changes affecting growing systems and hosts and evolutionary changes in pathogens and hosts. Tools to predict changes based on environmental correlations depend on good primary data, which is often absent, and need to be checked against the historical record, which remains very poor for almost all pathogens. We argue that at present the uncertainty in predictions of change is so great that the important adaptive response is to monitor changes and to retain the capacity to innovate, both by access to economic capital with reasonably long-term rates of return and by retaining wide scientific expertise, including currently less fashionable specialisms.
Resumo:
This article presents a statistical method for detecting recombination in DNA sequence alignments, which is based on combining two probabilistic graphical models: (1) a taxon graph (phylogenetic tree) representing the relationship between the taxa, and (2) a site graph (hidden Markov model) representing interactions between different sites in the DNA sequence alignments. We adopt a Bayesian approach and sample the parameters of the model from the posterior distribution with Markov chain Monte Carlo, using a Metropolis-Hastings and Gibbs-within-Gibbs scheme. The proposed method is tested on various synthetic and real-world DNA sequence alignments, and we compare its performance with the established detection methods RECPARS, PLATO, and TOPAL, as well as with two alternative parameter estimation schemes.
Resumo:
Models for which the likelihood function can be evaluated only up to a parameter-dependent unknown normalizing constant, such as Markov random field models, are used widely in computer science, statistical physics, spatial statistics, and network analysis. However, Bayesian analysis of these models using standard Monte Carlo methods is not possible due to the intractability of their likelihood functions. Several methods that permit exact, or close to exact, simulation from the posterior distribution have recently been developed. However, estimating the evidence and Bayes’ factors for these models remains challenging in general. This paper describes new random weight importance sampling and sequential Monte Carlo methods for estimating BFs that use simulation to circumvent the evaluation of the intractable likelihood, and compares them to existing methods. In some cases we observe an advantage in the use of biased weight estimates. An initial investigation into the theoretical and empirical properties of this class of methods is presented. Some support for the use of biased estimates is presented, but we advocate caution in the use of such estimates.
Resumo:
Land cover data derived from satellites are commonly used to prescribe inputs to models of the land surface. Since such data inevitably contains errors, quantifying how uncertainties in the data affect a model’s output is important. To do so, a spatial distribution of possible land cover values is required to propagate through the model’s simulation. However, at large scales, such as those required for climate models, such spatial modelling can be difficult. Also, computer models often require land cover proportions at sites larger than the original map scale as inputs, and it is the uncertainty in these proportions that this article discusses. This paper describes a Monte Carlo sampling scheme that generates realisations of land cover proportions from the posterior distribution as implied by a Bayesian analysis that combines spatial information in the land cover map and its associated confusion matrix. The technique is computationally simple and has been applied previously to the Land Cover Map 2000 for the region of England and Wales. This article demonstrates the ability of the technique to scale up to large (global) satellite derived land cover maps and reports its application to the GlobCover 2009 data product. The results show that, in general, the GlobCover data possesses only small biases, with the largest belonging to non–vegetated surfaces. In vegetated surfaces, the most prominent area of uncertainty is Southern Africa, which represents a complex heterogeneous landscape. It is also clear from this study that greater resources need to be devoted to the construction of comprehensive confusion matrices.
Resumo:
Bayesian inference has been used to determine rigorous estimates of hydroxyl radical concentrations () and air mass dilution rates (K) averaged following air masses between linked observations of nonmethane hydrocarbons (NMHCs) spanning the North Atlantic during the Intercontinental Transport and Chemical Transformation (ITCT)-Lagrangian-2K4 experiment. The Bayesian technique obtains a refined (posterior) distribution of a parameter given data related to the parameter through a model and prior beliefs about the parameter distribution. Here, the model describes hydrocarbon loss through OH reaction and mixing with a background concentration at rate K. The Lagrangian experiment provides direct observations of hydrocarbons at two time points, removing assumptions regarding composition or sources upstream of a single observation. The estimates are sharpened by using many hydrocarbons with different reactivities and accounting for their variability and measurement uncertainty. A novel technique is used to construct prior background distributions of many species, described by variation of a single parameter . This exploits the high correlation of species, related by the first principal component of many NMHC samples. The Bayesian method obtains posterior estimates of , K and following each air mass. Median values are typically between 0.5 and 2.0 × 106 molecules cm−3, but are elevated to between 2.5 and 3.5 × 106 molecules cm−3, in low-level pollution. A comparison of estimates from absolute NMHC concentrations and NMHC ratios assuming zero background (the “photochemical clock” method) shows similar distributions but reveals systematic high bias in the estimates from ratios. Estimates of K are ∼0.1 day−1 but show more sensitivity to the prior distribution assumed.
Resumo:
This article introduces a new general method for genealogical inference that samples independent genealogical histories using importance sampling (IS) and then samples other parameters with Markov chain Monte Carlo (MCMC). It is then possible to more easily utilize the advantages of importance sampling in a fully Bayesian framework. The method is applied to the problem of estimating recent changes in effective population size from temporally spaced gene frequency data. The method gives the posterior distribution of effective population size at the time of the oldest sample and at the time of the most recent sample, assuming a model of exponential growth or decline during the interval. The effect of changes in number of alleles, number of loci, and sample size on the accuracy of the method is described using test simulations, and it is concluded that these have an approximately equivalent effect. The method is used on three example data sets and problems in interpreting the posterior densities are highlighted and discussed.
Resumo:
The absorption spectra of phytoplankton in the visible domain hold implicit information on the phytoplankton community structure. Here we use this information to retrieve quantitative information on phytoplankton size structure by developing a novel method to compute the exponent of an assumed power-law for their particle-size spectrum. This quantity, in combination with total chlorophyll-a concentration, can be used to estimate the fractional concentration of chlorophyll in any arbitrarily-defined size class of phytoplankton. We further define and derive expressions for two distinct measures of cell size of mixed populations, namely, the average spherical diameter of a bio-optically equivalent homogeneous population of cells of equal size, and the average equivalent spherical diameter of a population of cells that follow a power-law particle-size distribution. The method relies on measurements of two quantities of a phytoplankton sample: the concentration of chlorophyll-a, which is an operational index of phytoplankton biomass, and the total absorption coefficient of phytoplankton in the red peak of visible spectrum at 676 nm. A sensitivity analysis confirms that the relative errors in the estimates of the exponent of particle size spectra are reasonably low. The exponents of phytoplankton size spectra, estimated for a large set of in situ data from a variety of oceanic environments (~ 2400 samples), are within a reasonable range; and the estimated fractions of chlorophyll in pico-, nano- and micro-phytoplankton are generally consistent with those obtained by an independent, indirect method based on diagnostic pigments determined using high-performance liquid chromatography. The estimates of cell size for in situ samples dominated by different phytoplankton types (diatoms, prymnesiophytes, Prochlorococcus, other cyanobacteria and green algae) yield nominal sizes consistent with the taxonomic classification. To estimate the same quantities from satellite-derived ocean-colour data, we combine our method with algorithms for obtaining inherent optical properties from remote sensing. The spatial distribution of the size-spectrum exponent and the chlorophyll fractions of pico-, nano- and micro-phytoplankton estimated from satellite remote sensing are in agreement with the current understanding of the biogeography of phytoplankton functional types in the global oceans. This study contributes to our understanding of the distribution and time evolution of phytoplankton size structure in the global oceans.
Resumo:
Inferring the spatial expansion dynamics of invading species from molecular data is notoriously difficult due to the complexity of the processes involved. For these demographic scenarios, genetic data obtained from highly variable markers may be profitably combined with specific sampling schemes and information from other sources using a Bayesian approach. The geographic range of the introduced toad Bufo marinus is still expanding in eastern and northern Australia, in each case from isolates established around 1960. A large amount of demographic and historical information is available on both expansion areas. In each area, samples were collected along a transect representing populations of different ages and genotyped at 10 microsatellite loci. Five demographic models of expansion, differing in the dispersal pattern for migrants and founders and in the number of founders, were considered. Because the demographic history is complex, we used an approximate Bayesian method, based on a rejection-regression algorithm. to formally test the relative likelihoods of the five models of expansion and to infer demographic parameters. A stepwise migration-foundation model with founder events was statistically better supported than other four models in both expansion areas. Posterior distributions supported different dynamics of expansion in the studied areas. Populations in the eastern expansion area have a lower stable effective population size and have been founded by a smaller number of individuals than those in the northern expansion area. Once demographically stabilized, populations exchange a substantial number of effective migrants per generation in both expansion areas, and such exchanges are larger in northern than in eastern Australia. The effective number of migrants appears to be considerably lower than that of founders in both expansion areas. We found our inferences to be relatively robust to various assumptions on marker. demographic, and historical features. The method presented here is the only robust, model-based method available so far, which allows inferring complex population dynamics over a short time scale. It also provides the basis for investigating the interplay between population dynamics, drift, and selection in invasive species.
Resumo:
A means of assessing, monitoring and controlling aggregate emissions from multi-instrument Emissions Trading Schemes is proposed. The approach allows contributions from different instruments with different forms of emissions targets to be integrated. Where Emissions Trading Schemes are helping meet specific national targets, the approach allows the entry requirements of new participants to be calculated and set at a level that will achieve these targets. The approach is multi-levelled, and may be extended downwards to support pooling of participants within instruments, or upwards to embed Emissions Trading Schemes within a wider suite of policies and measures with hard and soft targets. Aggregate emissions from each instrument are treated stochastically. Emissions from the scheme as a whole are then the joint probability distribution formed by integrating the emissions from its instruments. Because a Bayesian approach is adopted, qualitative and semi-qualitative data from expert opinion can be used where quantitative data is not currently available, or is incomplete. This approach helps government retain sufficient control over emissions trading scheme targets to allow them to meet their emissions reduction obligations, while minimising the need for retrospectively adjusting existing participants’ conditions of entry. This maintains participant confidence, while providing the necessary policy levers for good governance.
Resumo:
We describe a Bayesian approach to analyzing multilocus genotype or haplotype data to assess departures from gametic (linkage) equilibrium. Our approach employs a Markov chain Monte Carlo (MCMC) algorithm to approximate the posterior probability distributions of disequilibrium parameters. The distributions are computed exactly in some simple settings. Among other advantages, posterior distributions can be presented visually, which allows the uncertainties in parameter estimates to be readily assessed. In addition, background knowledge can be incorporated, where available, to improve the precision of inferences. The method is illustrated by application to previously published datasets; implications for multilocus forensic match probabilities and for simple association-based gene mapping are also discussed.