109 resultados para Likelihood Functions
em CentAUR: Central Archive University of Reading - UK
Resumo:
Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.
Resumo:
Models for which the likelihood function can be evaluated only up to a parameter-dependent unknown normalizing constant, such as Markov random field models, are used widely in computer science, statistical physics, spatial statistics, and network analysis. However, Bayesian analysis of these models using standard Monte Carlo methods is not possible due to the intractability of their likelihood functions. Several methods that permit exact, or close to exact, simulation from the posterior distribution have recently been developed. However, estimating the evidence and Bayes’ factors for these models remains challenging in general. This paper describes new random weight importance sampling and sequential Monte Carlo methods for estimating BFs that use simulation to circumvent the evaluation of the intractable likelihood, and compares them to existing methods. In some cases we observe an advantage in the use of biased weight estimates. An initial investigation into the theoretical and empirical properties of this class of methods is presented. Some support for the use of biased estimates is presented, but we advocate caution in the use of such estimates.
Resumo:
We describe the use of bivariate 3d empirical orthogonal functions (EOFs) in characterising low frequency variability of the Atlantic thermohaline circulation (THC) in the Hadley Centre global climate model, HadCM3. We find that the leading two modes are well correlated with an index of the meridional overturning circulation (MOC) on decadal timescales, with the leading mode alone accounting for 54% of the decadal variance. Episodes of coherent oscillations in the sub-space of the leading EOFs are identified; these episodes are of great interest for the predictability of the THC, and could indicate the existence of different regimes of natural variability. The mechanism identified for the multi-decadal variability is an internal ocean mode, dominated by changes in convection in the Nordic Seas, which lead the changes in the MOC by a few years. Variations in salinity transports from the Arctic and from the North Atlantic are the main feedbacks which control the oscillation. This mode has a weak feedback onto the atmosphere and hence a surface climatic influence. Interestingly, some of these climate impacts lead the changes in the overturning. There are also similarities to observed multi-decadal climate variability.
Resumo:
The variogram is essential for local estimation and mapping of any variable by kriging. The variogram itself must usually be estimated from sample data. The sampling density is a compromise between precision and cost, but it must be sufficiently dense to encompass the principal spatial sources of variance. A nested, multi-stage, sampling with separating distances increasing in geometric progression from stage to stage will do that. The data may then be analyzed by a hierarchical analysis of variance to estimate the components of variance for every stage, and hence lag. By accumulating the components starting from the shortest lag one obtains a rough variogram for modest effort. For balanced designs the analysis of variance is optimal; for unbalanced ones, however, these estimators are not necessarily the best, and the analysis by residual maximum likelihood (REML) will usually be preferable. The paper summarizes the underlying theory and illustrates its application with data from three surveys, one in which the design had four stages and was balanced and two implemented with unbalanced designs to economize when there were more stages. A Fortran program is available for the analysis of variance, and code for the REML analysis is listed in the paper. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
1. We compared the baseline phosphorus (P) concentrations inferred by diatom-P transfer functions and export coefficient models at 62 lakes in Great Britain to assess whether the techniques produce similar estimates of historical nutrient status. 2. There was a strong linear relationship between the two sets of values over the whole total P (TP) gradient (2-200 mu g TP L-1). However, a systematic bias was observed with the diatom model producing the higher values in 46 lakes (of which values differed by more than 10 mu g TP L-1 in 21). The export coefficient model gave the higher values in 10 lakes (of which the values differed by more than 10 mu g TP L-1 in only 4). 3. The difference between baseline and present-day TP concentrations was calculated to compare the extent of eutrophication inferred by the two sets of model output. There was generally poor agreement between the amounts of change estimated by the two approaches. The discrepancy in both the baseline values and the degree of change inferred by the models was greatest in the shallow and more productive sites. 4. Both approaches were applied to two lakes in the English Lake District where long-term P data exist, to assess how well the models track measured P concentrations since approximately 1850. There was good agreement between the pre-enrichment TP concentrations generated by the models. The diatom model paralleled the steeper rise in maximum soluble reactive P (SRP) more closely than the gradual increase in annual mean TP in both lakes. The export coefficient model produced a closer fit to observed annual mean TP concentrations for both sites, tracking the changes in total external nutrient loading. 5. A combined approach is recommended, with the diatom model employed to reflect the nature and timing of the in-lake response to changes in nutrient loading, and the export coefficient model used to establish the origins and extent of changes in the external load and to assess potential reduction in loading under different management scenarios. 6. However, caution must be exercised when applying these models to shallow lakes where the export coefficient model TP estimate will not include internal P loading from lake sediments and where the diatom TP inferences may over-estimate TP concentrations because of the high abundance of benthic taxa, many of which are poor indicators of trophic state.
Resumo:
It has been generally accepted that the method of moments (MoM) variogram, which has been widely applied in soil science, requires about 100 sites at an appropriate interval apart to describe the variation adequately. This sample size is often larger than can be afforded for soil surveys of agricultural fields or contaminated sites. Furthermore, it might be a much larger sample size than is needed where the scale of variation is large. A possible alternative in such situations is the residual maximum likelihood (REML) variogram because fewer data appear to be required. The REML method is parametric and is considered reliable where there is trend in the data because it is based on generalized increments that filter trend out and only the covariance parameters are estimated. Previous research has suggested that fewer data are needed to compute a reliable variogram using a maximum likelihood approach such as REML, however, the results can vary according to the nature of the spatial variation. There remain issues to examine: how many fewer data can be used, how should the sampling sites be distributed over the site of interest, and how do different degrees of spatial variation affect the data requirements? The soil of four field sites of different size, physiography, parent material and soil type was sampled intensively, and MoM and REML variograms were calculated for clay content. The data were then sub-sampled to give different sample sizes and distributions of sites and the variograms were computed again. The model parameters for the sets of variograms for each site were used for cross-validation. Predictions based on REML variograms were generally more accurate than those from MoM variograms with fewer than 100 sampling sites. A sample size of around 50 sites at an appropriate distance apart, possibly determined from variograms of ancillary data, appears adequate to compute REML variograms for kriging soil properties for precision agriculture and contaminated sites. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
An unbalanced nested sampling design was used to investigate the spatial scale of soil and herbicide interactions at the field scale. A hierarchical analysis of variance based on residual maximum likelihood (REML) was used to analyse the data and provide a first estimate of the variogram. Soil samples were taken at 108 locations at a range of separating distances in a 9 ha field to explore small and medium scale spatial variation. Soil organic matter content, pH, particle size distribution, microbial biomass and the degradation and sorption of the herbicide, isoproturon, were determined for each soil sample. A large proportion of the spatial variation in isoproturon degradation and sorption occurred at sampling intervals less than 60 m, however, the sampling design did not resolve the variation present at scales greater than this. A sampling interval of 20-25 m should ensure that the main spatial structures are identified for isoproturon degradation rate and sorption without too great a loss of information in this field.