988 resultados para model averaging
Resumo:
Harmful Algal Blooms (HABs) are a worldwide problem that have been increasing in frequency and extent over the past several decades. HABs severely damage aquatic ecosystems by destroying benthic habitat, reducing invertebrate and fish populations and affecting larger species such as dugong that rely on seagrasses for food. Few statistical models for predicting HAB occurrences have been developed, and in common with most predictive models in ecology, those that have been developed do not fully account for uncertainties in parameters and model structure. This makes management decisions based on these predictions more risky than might be supposed. We used a probit time series model and Bayesian Model Averaging (BMA) to predict occurrences of blooms of Lyngbya majuscula, a toxic cyanophyte, in Deception Bay, Queensland, Australia. We found a suite of useful predictors for HAB occurrence, with Temperature figuring prominently in models with the majority of posterior support, and a model consisting of the single covariate average monthly minimum temperature showed by far the greatest posterior support. A comparison of alternative model averaging strategies was made with one strategy using the full posterior distribution and a simpler approach that utilised the majority of the posterior distribution for predictions but with vastly fewer models. Both BMA approaches showed excellent predictive performance with little difference in their predictive capacity. Applications of BMA are still rare in ecology, particularly in management settings. This study demonstrates the power of BMA as an important management tool that is capable of high predictive performance while fully accounting for both parameter and model uncertainty.
Resumo:
This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single “best” model, where “best” is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to account for model uncertainty. This was illustrated using a case study in which the aim was the description of lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample size was reduced, no single model was revealed as “best”, suggesting that a BMA approach would be appropriate. Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can provide robust predictions and facilitate more detailed investigation of the relationships between gene expression and patient survival. Keywords: Bayesian modelling; Bayesian model averaging; Cure model; Markov Chain Monte Carlo; Mixture model; Survival analysis; Weibull distribution
Resumo:
Nitrous oxide (N2O) is one of the greenhouse gases that can contribute to global warming. Spatial variability of N2O can lead to large uncertainties in prediction. However, previous studies have often ignored the spatial dependency to quantify the N2O - environmental factors relationships. Few researches have examined the impacts of various spatial correlation structures (e.g. independence, distance-based and neighbourhood based) on spatial prediction of N2O emissions. This study aimed to assess the impact of three spatial correlation structures on spatial predictions and calibrate the spatial prediction using Bayesian model averaging (BMA) based on replicated, irregular point-referenced data. The data were measured in 17 chambers randomly placed across a 271 m(2) field between October 2007 and September 2008 in the southeast of Australia. We used a Bayesian geostatistical model and a Bayesian spatial conditional autoregressive (CAR) model to investigate and accommodate spatial dependency, and to estimate the effects of environmental variables on N2O emissions across the study site. We compared these with a Bayesian regression model with independent errors. The three approaches resulted in different derived maps of spatial prediction of N2O emissions. We found that incorporating spatial dependency in the model not only substantially improved predictions of N2O emission from soil, but also better quantified uncertainties of soil parameters in the study. The hybrid model structure obtained by BMA improved the accuracy of spatial prediction of N2O emissions across this study region.
Resumo:
This chapter presents a model averaging approach in the M-open setting using sample re-use methods to approximate the predictive distribution of future observations. It first reviews the standard M-closed Bayesian Model Averaging approach and decision-theoretic methods for producing inferences and decisions. It then reviews model selection from the M-complete and M-open perspectives, before formulating a Bayesian solution to model averaging in the M-open perspective. It constructs optimal weights for MOMA:M-open Model Averaging using a decision-theoretic framework, where models are treated as part of the ‘action space’ rather than unknown states of nature. Using ‘incompatible’ retrospective and prospective models for data from a case-control study, the chapter demonstrates that MOMA gives better predictive accuracy than the proxy models. It concludes with open questions and future directions.
Resumo:
A benefit function transfer obtains estimates of willingness-to-pay (WTP) for the evaluation of a given policy at a site by combining existing information from different study sites. This has the advantage that more efficient estimates are obtained, but it relies on the assumption that the heterogeneity between sites is appropriately captured in the benefit transfer model. A more expensive alternative to estimate WTP is to analyze only data from the policy site in question while ignoring information from other sites. We make use of the fact that these two choices can be viewed as a model selection problem and extend the set of models to allow for the hypothesis that the benefit function is only applicable to a subset of sites. We show how Bayesian model averaging (BMA) techniques can be used to optimally combine information from all models. The Bayesian algorithm searches for the set of sites that can form the basis for estimating a benefit function and reveals whether such information can be transferred to new sites for which only a small data set is available. We illustrate the method with a sample of 42 forests from U.K. and Ireland. We find that BMA benefit function transfer produces reliable estimates and can increase about 8 times the information content of a small sample when the forest is 'poolable'. © 2008 Elsevier Inc. All rights reserved.
Resumo:
A Bayesian Model Averaging approach to the estimation of lag structures is introduced, and applied to assess the impact of R&D on agricultural productivity in the US from 1889 to 1990. Lag and structural break coefficients are estimated using a reversible jump algorithm that traverses the model space. In addition to producing estimates and standard deviations for the coe¢ cients, the probability that a given lag (or break) enters the model is estimated. The approach is extended to select models populated with Gamma distributed lags of di¤erent frequencies. Results are consistent with the hypothesis that R&D positively drives productivity. Gamma lags are found to retain their usefulness in imposing a plausible structure on lag coe¢ cients, and their role is enhanced through the use of model averaging.
Resumo:
Bayesian Model Averaging (BMA) is used for testing for multiple break points in univariate series using conjugate normal-gamma priors. This approach can test for the number of structural breaks and produce posterior probabilities for a break at each point in time. Results are averaged over specifications including: stationary; stationary around trend and unit root models, each containing different types and number of breaks and different lag lengths. The procedures are used to test for structural breaks on 14 annual macroeconomic series and 11 natural resource price series. The results indicate that there are structural breaks in all of the natural resource series and most of the macroeconomic series. Many of the series had multiple breaks. Our findings regarding the existence of unit roots, having allowed for structural breaks in the data, are largely consistent with previous work.
Resumo:
Multi-model ensembles are frequently used to assess understanding of the response of ozone and methane lifetime to changes in emissions of ozone precursors such as NOx, VOCs (volatile organic compounds) and CO. When these ozone changes are used to calculate radiative forcing (RF) (and climate metrics such as the global warming potential (GWP) and global temperature-change potential (GTP)) there is a methodological choice, determined partly by the available computing resources, as to whether the mean ozone (and methane) concentration changes are input to the radiation code, or whether each model's ozone and methane changes are used as input, with the average RF computed from the individual model RFs. We use data from the Task Force on Hemispheric Transport of Air Pollution source–receptor global chemical transport model ensemble to assess the impact of this choice for emission changes in four regions (East Asia, Europe, North America and South Asia). We conclude that using the multi-model mean ozone and methane responses is accurate for calculating the mean RF, with differences up to 0.6% for CO, 0.7% for VOCs and 2% for NOx. Differences of up to 60% for NOx 7% for VOCs and 3% for CO are introduced into the 20 year GWP. The differences for the 20 year GTP are smaller than for the GWP for NOx, and similar for the other species. However, estimates of the standard deviation calculated from the ensemble-mean input fields (where the standard deviation at each point on the model grid is added to or subtracted from the mean field) are almost always substantially larger in RF, GWP and GTP metrics than the true standard deviation, and can be larger than the model range for short-lived ozone RF, and for the 20 and 100 year GWP and 100 year GTP. The order of averaging has most impact on the metrics for NOx, as the net values for these quantities is the residual of the sum of terms of opposing signs. For example, the standard deviation for the 20 year GWP is 2–3 times larger using the ensemble-mean fields than using the individual models to calculate the RF. The source of this effect is largely due to the construction of the input ozone fields, which overestimate the true ensemble spread. Hence, while the average of multi-model fields are normally appropriate for calculating mean RF, GWP and GTP, they are not a reliable method for calculating the uncertainty in these fields, and in general overestimate the uncertainty.
Resumo:
Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.
Resumo:
We consider model selection uncertainty in linear regression. We study theoretically and by simulation the approach of Buckland and co-workers, who proposed estimating a parameter common to all models under study by taking a weighted average over the models, using weights obtained from information criteria or the bootstrap. This approach is compared with the usual approach in which the 'best' model is used, and with Bayesian model averaging. The weighted predictor behaves similarly to model averaging, with generally more realistic mean-squared errors than the usual model-selection-based estimator.
Resumo:
Genetic research of complex diseases is a challenging, but exciting, area of research. The early development of the research was limited, however, until the completion of the Human Genome and HapMap projects, along with the reduction in the cost of genotyping, which paves the way for understanding the genetic composition of complex diseases. In this thesis, we focus on the statistical methods for two aspects of genetic research: phenotype definition for diseases with complex etiology and methods for identifying potentially associated Single Nucleotide Polymorphisms (SNPs) and SNP-SNP interactions. With regard to phenotype definition for diseases with complex etiology, we firstly investigated the effects of different statistical phenotyping approaches on the subsequent analysis. In light of the findings, and the difficulties in validating the estimated phenotype, we proposed two different methods for reconciling phenotypes of different models using Bayesian model averaging as a coherent mechanism for accounting for model uncertainty. In the second part of the thesis, the focus is turned to the methods for identifying associated SNPs and SNP interactions. We review the use of Bayesian logistic regression with variable selection for SNP identification and extended the model for detecting the interaction effects for population based case-control studies. In this part of study, we also develop a machine learning algorithm to cope with the large scale data analysis, namely modified Logic Regression with Genetic Program (MLR-GEP), which is then compared with the Bayesian model, Random Forests and other variants of logic regression.
Resumo:
This thesis developed and applied Bayesian models for the analysis of survival data. The gene expression was considered as explanatory variables within the Bayesian survival model which can be considered the new contribution in the analysis of such data. The censoring factor that is inherent of survival data has also been addressed in terms of its impact on the fitting of a finite mixture of Weibull distribution with and without covariates. To investigate this, simulation study were carried out under several censoring percentages. Censoring percentage as high as 80% is acceptable here as the work involved high dimensional data. Lastly the Bayesian model averaging approach was developed to incorporate model uncertainty in the prediction of survival.
Resumo:
Catchment and riparian degradation has resulted in declining ecosystem health of streams worldwide. With restoration a priority in many regions, there is an increasing interest in the scale at which land use influences stream ecosystem health. Our goal was to use a substantial data set collected as part of a monitoring program (the Southeast Queensland, Australia, Ecological Health Monitoring Program data set, collected at 116 sites over six years) to identify the spatial scale of land use, or the combination of spatial scales, that most strongly influences overall ecosystem health. In addition, we aimed to determine whether the most influential scale differed for different aspects of ecosystem health. We used linear-mixed models and a Bayesian model-averaging approach to generate models for the overall aggregated ecosystem health score and for each of the five component indicators (fish, macroinvertebrates, water quality, nutrients, and ecosystem processes) that make up the score. Dense forest close to the survey site, mid-dense forest in the hydrologically active nearstream areas of the catchment, urbanization in the riparian buffer, and tree cover at the reach scale were all significant in explaining ecosystem health, suggesting an overriding influence of forest cover, particularly close to the stream. Season and antecedent rainfall were also important explanatory variables, with some land-use variables showing significant seasonal interactions. There were also differential influences of land use for each of the component indicators. Our approach is useful given that restoring general ecosystem health is the focus of many stream restoration projects; it allowed us to predict the scale and catchment position of restoration that would result in the greatest improvement of ecosystem health in the regions streams and rivers. The models we generated suggested that good ecosystem health can be maintained in catchments where 80% of hydrologically active areas in close proximity to the stream have mid-dense forest cover and moderate health can be obtained with 60% cover.