166 resultados para Bayesian ridge regression
em CentAUR: Central Archive University of Reading - UK
Resumo:
Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.
Resumo:
In this paper, Bayesian decision procedures are developed for dose-escalation studies based on bivariate observations of undesirable events and signs of therapeutic benefit. The methods generalize earlier approaches taking into account only the undesirable outcomes. Logistic regression models are used to model the two responses, which are both assumed to take a binary form. A prior distribution for the unknown model parameters is suggested and an optional safety constraint can be included. Gain functions to be maximized are formulated in terms of accurate estimation of the limits of a therapeutic window or optimal treatment of the next cohort of subjects, although the approach could be applied to achieve any of a wide variety of objectives. The designs introduced are illustrated through simulation and retrospective implementation to a completed dose-escalation study. Copyright © 2006 John Wiley & Sons, Ltd.
Resumo:
Recently, various approaches have been suggested for dose escalation studies based on observations of both undesirable events and evidence of therapeutic benefit. This article concerns a Bayesian approach to dose escalation that requires the user to make numerous design decisions relating to the number of doses to make available, the choice of the prior distribution, the imposition of safety constraints and stopping rules, and the criteria by which the design is to be optimized. Results are presented of a substantial simulation study conducted to investigate the influence of some of these factors on the safety and the accuracy of the procedure with a view toward providing general guidance for investigators conducting such studies. The Bayesian procedures evaluated use logistic regression to model the two responses, which are both assumed to be binary. The simulation study is based on features of a recently completed study of a compound with potential benefit to patients suffering from inflammatory diseases of the lung.
A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms
Resumo:
Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.
Resumo:
This study presents a new simple approach for combining empirical with raw (i.e., not bias corrected) coupled model ensemble forecasts in order to make more skillful interval forecasts of ENSO. A Bayesian normal model has been used to combine empirical and raw coupled model December SST Niño-3.4 index forecasts started at the end of the preceding July (5-month lead time). The empirical forecasts were obtained by linear regression between December and the preceding July Niño-3.4 index values over the period 1950–2001. Coupled model ensemble forecasts for the period 1987–99 were provided by ECMWF, as part of the Development of a European Multimodel Ensemble System for Seasonal to Interannual Prediction (DEMETER) project. Empirical and raw coupled model ensemble forecasts alone have similar mean absolute error forecast skill score, compared to climatological forecasts, of around 50% over the period 1987–99. The combined forecast gives an increased skill score of 74% and provides a well-calibrated and reliable estimate of forecast uncertainty.
Resumo:
In this paper, Bayesian decision procedures are developed for dose-escalation studies based on binary measures of undesirable events and continuous measures of therapeutic benefit. The methods generalize earlier approaches where undesirable events and therapeutic benefit are both binary. A logistic regression model is used to model the binary responses, while a linear regression model is used to model the continuous responses. Prior distributions for the unknown model parameters are suggested. A gain function is discussed and an optional safety constraint is included. Copyright (C) 2006 John Wiley & Sons, Ltd.
Resumo:
We have developed a new Bayesian approach to retrieve oceanic rain rate from the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI), with an emphasis on typhoon cases in the West Pacific. Retrieved rain rates are validated with measurements of rain gauges located on Japanese islands. To demonstrate improvement, retrievals are also compared with those from the TRMM/Precipitation Radar (PR), the Goddard Profiling Algorithm (GPROF), and a multi-channel linear regression statistical method (MLRS). We have found that qualitatively, all methods retrieved similar horizontal distributions in terms of locations of eyes and rain bands of typhoons. Quantitatively, our new Bayesian retrievals have the best linearity and the smallest root mean square (RMS) error against rain gauge data for 16 typhoon overpasses in 2004. The correlation coefficient and RMS of our retrievals are 0.95 and ~2 mm hr-1, respectively. In particular, at heavy rain rates, our Bayesian retrievals outperform those retrieved from GPROF and MLRS. Overall, the new Bayesian approach accurately retrieves surface rain rate for typhoon cases. Accurate rain rate estimates from this method can be assimilated in models to improve forecast and prevent potential damages in Taiwan during typhoon seasons.
Resumo:
We present a model of market participation in which the presence of non-negligible fixed costs leads to random censoring of the traditional double-hurdle model. Fixed costs arise when household resources must be devoted a priori to the decision to participate in the market. These costs, usually of time, are manifested in non-negligible minimum-efficient supplies and supply correspondence that requires modification of the traditional Tobit regression. The costs also complicate econometric estimation of household behavior. These complications are overcome by application of the Gibbs sampler. The algorithm thus derived provides robust estimates of the fixed-costs, double-hurdle model. The model and procedures are demonstrated in an application to milk market participation in the Ethiopian highlands.
Resumo:
Many applications, such as intermittent data assimilation, lead to a recursive application of Bayesian inference within a Monte Carlo context. Popular data assimilation algorithms include sequential Monte Carlo methods and ensemble Kalman filters (EnKFs). These methods differ in the way Bayesian inference is implemented. Sequential Monte Carlo methods rely on importance sampling combined with a resampling step, while EnKFs utilize a linear transformation of Monte Carlo samples based on the classic Kalman filter. While EnKFs have proven to be quite robust even for small ensemble sizes, they are not consistent since their derivation relies on a linear regression ansatz. In this paper, we propose another transform method, which does not rely on any a priori assumptions on the underlying prior and posterior distributions. The new method is based on solving an optimal transportation problem for discrete random variables. © 2013, Society for Industrial and Applied Mathematics
Resumo:
Sensible and latent heat fluxes are often calculated from bulk transfer equations combined with the energy balance. For spatial estimates of these fluxes, a combination of remotely sensed and standard meteorological data from weather stations is used. The success of this approach depends on the accuracy of the input data and on the accuracy of two variables in particular: aerodynamic and surface conductance. This paper presents a Bayesian approach to improve estimates of sensible and latent heat fluxes by using a priori estimates of aerodynamic and surface conductance alongside remote measurements of surface temperature. The method is validated for time series of half-hourly measurements in a fully grown maize field, a vineyard and a forest. It is shown that the Bayesian approach yields more accurate estimates of sensible and latent heat flux than traditional methods.
Correlating Bayesian date estimates with climatic events and domestication using a bovine case study
Resumo:
The tribe Bovini contains a number of commercially and culturally important species, such as cattle. Understanding their evolutionary time scale is important for distinguishing between post-glacial and domestication-associated population expansions, but estimates of bovine divergence times have been hindered by a lack of reliable calibration points. We present a Bayesian phylogenetic analysis of 481 mitochondrial D-loop sequences, including 228 radiocarbon-dated ancient DNA sequences, using a multi-demographic coalescent model. By employing the radiocarbon dates as internal calibrations, we co-estimate the bovine phylogeny and divergence times in a relaxed-clock framework. The analysis yields evidence for significant population expansions in both taurine and zebu cattle, European aurochs and yak clades. The divergence age estimates support domestication-associated expansion times (less than 12 kyr) for the major haplogroups of cattle. We compare the molecular and palaeontological estimates for the Bison-Bos divergence.
Resumo:
This study compares the infant mortality profiles of 128 infants from two urban and two rural cemetery sites in medieval England. The aim of this paper is to assess the impact of urbanization and industrialization in terms of endogenous or exogenous causes of death. In order to undertake this analysis, two different methods of estimating gestational age from long bone lengths were used: a traditional regression method and a Bayesian method. The regression method tended to produce more marked peaks at 38 weeks, while the Bayesian method produced a broader range of ages and were more comparable with the expected "natural" mortality profiles. At all the sites, neonatal mortality (28-40 weeks) outweighed post-neonatal mortality (41-48 weeks) with rural Raunds Furnells in Northamptonshire, showing the highest number of neonatal deaths and post-medieval Spitalfields, London, showing a greater proportion of deaths due to exogenous or environmental factors. Of the four sites under study, Wharram Percy in Yorkshire showed the most convincing "natural" infant mortality profile, suggesting the inclusion of all births (i.e., stillbirths and unbaptised infants).