10 resultados para Model selection criteria
em Duke University
Resumo:
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham's-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains. © Institute of Mathematical Statistics, 2010.
Resumo:
This chapter presents a model averaging approach in the M-open setting using sample re-use methods to approximate the predictive distribution of future observations. It first reviews the standard M-closed Bayesian Model Averaging approach and decision-theoretic methods for producing inferences and decisions. It then reviews model selection from the M-complete and M-open perspectives, before formulating a Bayesian solution to model averaging in the M-open perspective. It constructs optimal weights for MOMA:M-open Model Averaging using a decision-theoretic framework, where models are treated as part of the ‘action space’ rather than unknown states of nature. Using ‘incompatible’ retrospective and prospective models for data from a case-control study, the chapter demonstrates that MOMA gives better predictive accuracy than the proxy models. It concludes with open questions and future directions.
Resumo:
Carbon Capture and Storage may use deep saline aquifers for CO(2) sequestration, but small CO(2) leakage could pose a risk to overlying fresh groundwater. We performed laboratory incubations of CO(2) infiltration under oxidizing conditions for >300 days on samples from four freshwater aquifers to 1) understand how CO(2) leakage affects freshwater quality; 2) develop selection criteria for deep sequestration sites based on inorganic metal contamination caused by CO(2) leaks to shallow aquifers; and 3) identify geochemical signatures for early detection criteria. After exposure to CO(2), water pH declines of 1-2 units were apparent in all aquifer samples. CO(2) caused concentrations of the alkali and alkaline earths and manganese, cobalt, nickel, and iron to increase by more than 2 orders of magnitude. Potentially dangerous uranium and barium increased throughout the entire experiment in some samples. Solid-phase metal mobility, carbonate buffering capacity, and redox state in the shallow overlying aquifers influence the impact of CO(2) leakage and should be considered when selecting deep geosequestration sites. Manganese, iron, calcium, and pH could be used as geochemical markers of a CO(2) leak, as their concentrations increase within 2 weeks of exposure to CO(2).
Indication of electron neutrino appearance from an accelerator-produced off-axis muon neutrino beam.
Resumo:
The T2K experiment observes indications of ν(μ) → ν(e) appearance in data accumulated with 1.43×10(20) protons on target. Six events pass all selection criteria at the far detector. In a three-flavor neutrino oscillation scenario with |Δm(23)(2)| = 2.4×10(-3) eV(2), sin(2)2θ(23) = 1 and sin(2)2θ(13) = 0, the expected number of such events is 1.5±0.3(syst). Under this hypothesis, the probability to observe six or more candidate events is 7×10(-3), equivalent to 2.5σ significance. At 90% C.L., the data are consistent with 0.03(0.04) < sin(2)2θ(13) < 0.28(0.34) for δ(CP) = 0 and a normal (inverted) hierarchy.
Resumo:
UNLABELLED: • PREMISE OF THE STUDY: Understanding fern (monilophyte) phylogeny and its evolutionary timescale is critical for broad investigations of the evolution of land plants, and for providing the point of comparison necessary for studying the evolution of the fern sister group, seed plants. Molecular phylogenetic investigations have revolutionized our understanding of fern phylogeny, however, to date, these studies have relied almost exclusively on plastid data.• METHODS: Here we take a curated phylogenomics approach to infer the first broad fern phylogeny from multiple nuclear loci, by combining broad taxon sampling (73 ferns and 12 outgroup species) with focused character sampling (25 loci comprising 35877 bp), along with rigorous alignment, orthology inference and model selection.• KEY RESULTS: Our phylogeny corroborates some earlier inferences and provides novel insights; in particular, we find strong support for Equisetales as sister to the rest of ferns, Marattiales as sister to leptosporangiate ferns, and Dennstaedtiaceae as sister to the eupolypods. Our divergence-time analyses reveal that divergences among the extant fern orders all occurred prior to ∼200 MYA. Finally, our species-tree inferences are congruent with analyses of concatenated data, but generally with lower support. Those cases where species-tree support values are higher than expected involve relationships that have been supported by smaller plastid datasets, suggesting that deep coalescence may be reducing support from the concatenated nuclear data.• CONCLUSIONS: Our study demonstrates the utility of a curated phylogenomics approach to inferring fern phylogeny, and highlights the need to consider underlying data characteristics, along with data quantity, in phylogenetic studies.
Resumo:
An abstract of a thesis devoted to using helix-coil models to study unfolded states.\\
Research on polypeptide unfolded states has received much more attention in the last decade or so than it has in the past. Unfolded states are thought to be implicated in various
misfolding diseases and likely play crucial roles in protein folding equilibria and folding rates. Structural characterization of unfolded states has proven to be
much more difficult than the now well established practice of determining the structures of folded proteins. This is largely because many core assumptions underlying
folded structure determination methods are invalid for unfolded states. This has led to a dearth of knowledge concerning the nature of unfolded state conformational
distributions. While many aspects of unfolded state structure are not well known, there does exist a significant body of work stretching back half a century that
has been focused on structural characterization of marginally stable polypeptide systems. This body of work represents an extensive collection of experimental
data and biophysical models associated with describing helix-coil equilibria in polypeptide systems. Much of the work on unfolded states in the last decade has not been devoted
specifically to the improvement of our understanding of helix-coil equilibria, which arguably is the most well characterized of the various conformational equilibria
that likely contribute to unfolded state conformational distributions. This thesis seeks to provide a deeper investigation of helix-coil equilibria using modern
statistical data analysis and biophysical modeling techniques. The studies contained within seek to provide deeper insights and new perspectives on what we presumably
know very well about protein unfolded states. \\
Chapter 1 gives an overview of recent and historical work on studying protein unfolded states. The study of helix-coil equilibria is placed in the context
of the general field of unfolded state research and the basics of helix-coil models are introduced.\\
Chapter 2 introduces the newest incarnation of a sophisticated helix-coil model. State of the art modern statistical techniques are employed to estimate the energies
of various physical interactions that serve to influence helix-coil equilibria. A new Bayesian model selection approach is utilized to test many long-standing
hypotheses concerning the physical nature of the helix-coil transition. Some assumptions made in previous models are shown to be invalid and the new model
exhibits greatly improved predictive performance relative to its predecessor. \\
Chapter 3 introduces a new statistical model that can be used to interpret amide exchange measurements. As amide exchange can serve as a probe for residue-specific
properties of helix-coil ensembles, the new model provides a novel and robust method to use these types of measurements to characterize helix-coil ensembles experimentally
and test the position-specific predictions of helix-coil models. The statistical model is shown to perform exceedingly better than the most commonly used
method for interpreting amide exchange data. The estimates of the model obtained from amide exchange measurements on an example helical peptide
also show a remarkable consistency with the predictions of the helix-coil model. \\
Chapter 4 involves a study of helix-coil ensembles through the enumeration of helix-coil configurations. Aside from providing new insights into helix-coil ensembles,
this chapter also introduces a new method by which helix-coil models can be extended to calculate new types of observables. Future work on this approach could potentially
allow helix-coil models to move into use domains that were previously inaccessible and reserved for other types of unfolded state models that were introduced in chapter 1.
Resumo:
The work presented in this dissertation is focused on applying engineering methods to develop and explore probabilistic survival models for the prediction of decompression sickness in US NAVY divers. Mathematical modeling, computational model development, and numerical optimization techniques were employed to formulate and evaluate the predictive quality of models fitted to empirical data. In Chapters 1 and 2 we present general background information relevant to the development of probabilistic models applied to predicting the incidence of decompression sickness. The remainder of the dissertation introduces techniques developed in an effort to improve the predictive quality of probabilistic decompression models and to reduce the difficulty of model parameter optimization.
The first project explored seventeen variations of the hazard function using a well-perfused parallel compartment model. Models were parametrically optimized using the maximum likelihood technique. Model performance was evaluated using both classical statistical methods and model selection techniques based on information theory. Optimized model parameters were overall similar to those of previously published Results indicated that a novel hazard function definition that included both ambient pressure scaling and individually fitted compartment exponent scaling terms.
We developed ten pharmacokinetic compartmental models that included explicit delay mechanics to determine if predictive quality could be improved through the inclusion of material transfer lags. A fitted discrete delay parameter augmented the inflow to the compartment systems from the environment. Based on the observation that symptoms are often reported after risk accumulation begins for many of our models, we hypothesized that the inclusion of delays might improve correlation between the model predictions and observed data. Model selection techniques identified two models as having the best overall performance, but comparison to the best performing model without delay and model selection using our best identified no delay pharmacokinetic model both indicated that the delay mechanism was not statistically justified and did not substantially improve model predictions.
Our final investigation explored parameter bounding techniques to identify parameter regions for which statistical model failure will not occur. When a model predicts a no probability of a diver experiencing decompression sickness for an exposure that is known to produce symptoms, statistical model failure occurs. Using a metric related to the instantaneous risk, we successfully identify regions where model failure will not occur and identify the boundaries of the region using a root bounding technique. Several models are used to demonstrate the techniques, which may be employed to reduce the difficulty of model optimization for future investigations.
Resumo:
Mixtures of Zellner's g-priors have been studied extensively in linear models and have been shown to have numerous desirable properties for Bayesian variable selection and model averaging. Several extensions of g-priors to Generalized Linear Models (GLMs) have been proposed in the literature; however, the choice of prior distribution of g and resulting properties for inference have received considerably less attention. In this paper, we extend mixtures of g-priors to GLMs by assigning the truncated Compound Confluent Hypergeometric (tCCH) distribution to 1/(1+g) and illustrate how this prior distribution encompasses several special cases of mixtures of g-priors in the literature, such as the Hyper-g, truncated Gamma, Beta-prime, and the Robust prior. Under an integrated Laplace approximation to the likelihood, the posterior distribution of 1/(1+g) is in turn a tCCH distribution, and approximate marginal likelihoods are thus available analytically. We discuss the local geometric properties of the g-prior in GLMs and show that specific choices of the hyper-parameters satisfy the various desiderata for model selection proposed by Bayarri et al, such as asymptotic model selection consistency, information consistency, intrinsic consistency, and measurement invariance. We also illustrate inference using these priors and contrast them to others in the literature via simulation and real examples.
Resumo:
The dynamics of a population undergoing selection is a central topic in evolutionary biology. This question is particularly intriguing in the case where selective forces act in opposing directions at two population scales. For example, a fast-replicating virus strain outcompetes slower-replicating strains at the within-host scale. However, if the fast-replicating strain causes host morbidity and is less frequently transmitted, it can be outcompeted by slower-replicating strains at the between-host scale. Here we consider a stochastic ball-and-urn process which models this type of phenomenon. We prove the weak convergence of this process under two natural scalings. The first scaling leads to a deterministic nonlinear integro-partial differential equation on the interval $[0,1]$ with dependence on a single parameter, $\lambda$. We show that the fixed points of this differential equation are Beta distributions and that their stability depends on $\lambda$ and the behavior of the initial data around $1$. The second scaling leads to a measure-valued Fleming-Viot process, an infinite dimensional stochastic process that is frequently associated with a population genetics.