970 resultados para Bayesian Modeling Averaging
Resumo:
We present a new haplotype-based approach for inferring local genetic ancestry of individuals in an admixed population. Most existing approaches for local ancestry estimation ignore the latent genetic relatedness between ancestral populations and treat them as independent. In this article, we exploit such information by building an inheritance model that describes both the ancestral populations and the admixed population jointly in a unified framework. Based on an assumption that the common hypothetical founder haplotypes give rise to both the ancestral and the admixed population haplotypes, we employ an infinite hidden Markov model to characterize each ancestral population and further extend it to generate the admixed population. Through an effective utilization of the population structural information under a principled nonparametric Bayesian framework, the resulting model is significantly less sensitive to the choice and the amount of training data for ancestral populations than state-of-the-art algorithms. We also improve the robustness under deviation from common modeling assumptions by incorporating population-specific scale parameters that allow variable recombination rates in different populations. Our method is applicable to an admixed population from an arbitrary number of ancestral populations and also performs competitively in terms of spurious ancestry proportions under a general multiway admixture assumption. We validate the proposed method by simulation under various admixing scenarios and present empirical analysis results from a worldwide-distributed dataset from the Human Genome Diversity Project.
Resumo:
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.
Resumo:
Vibration and acoustic analysis at higher frequencies faces two challenges: computing the response without using an excessive number of degrees of freedom, and quantifying its uncertainty due to small spatial variations in geometry, material properties and boundary conditions. Efficient models make use of the observation that when the response of a decoupled vibro-acoustic subsystem is sufficiently sensitive to uncertainty in such spatial variations, the local statistics of its natural frequencies and mode shapes saturate to universal probability distributions. This holds irrespective of the causes that underly these spatial variations and thus leads to a nonparametric description of uncertainty. This work deals with the identification of uncertain parameters in such models by using experimental data. One of the difficulties is that both experimental errors and modeling errors, due to the nonparametric uncertainty that is inherent to the model type, are present. This is tackled by employing a Bayesian inference strategy. The prior probability distribution of the uncertain parameters is constructed using the maximum entropy principle. The likelihood function that is subsequently computed takes the experimental information, the experimental errors and the modeling errors into account. The posterior probability distribution, which is computed with the Markov Chain Monte Carlo method, provides a full uncertainty quantification of the identified parameters, and indicates how well their uncertainty is reduced, with respect to the prior information, by the experimental data. © 2013 Taylor & Francis Group, London.
Resumo:
We consider the problem of variable selection in regression modeling in high-dimensional spaces where there is known structure among the covariates. This is an unconventional variable selection problem for two reasons: (1) The dimension of the covariate space is comparable, and often much larger, than the number of subjects in the study, and (2) the covariate space is highly structured, and in some cases it is desirable to incorporate this structural information in to the model building process. We approach this problem through the Bayesian variable selection framework, where we assume that the covariates lie on an undirected graph and formulate an Ising prior on the model space for incorporating structural information. Certain computational and statistical problems arise that are unique to such high-dimensional, structured settings, the most interesting being the phenomenon of phase transitions. We propose theoretical and computational schemes to mitigate these problems. We illustrate our methods on two different graph structures: the linear chain and the regular graph of degree k. Finally, we use our methods to study a specific application in genomics: the modeling of transcription factor binding sites in DNA sequences. © 2010 American Statistical Association.
Resumo:
We discuss a general approach to dynamic sparsity modeling in multivariate time series analysis. Time-varying parameters are linked to latent processes that are thresholded to induce zero values adaptively, providing natural mechanisms for dynamic variable inclusion/selection. We discuss Bayesian model specification, analysis and prediction in dynamic regressions, time-varying vector autoregressions, and multivariate volatility models using latent thresholding. Application to a topical macroeconomic time series problem illustrates some of the benefits of the approach in terms of statistical and economic interpretations as well as improved predictions. Supplementary materials for this article are available online. © 2013 Copyright Taylor and Francis Group, LLC.
Resumo:
Bagging is a method of obtaining more ro- bust predictions when the model class under consideration is unstable with respect to the data, i.e., small changes in the data can cause the predicted values to change significantly. In this paper, we introduce a Bayesian ver- sion of bagging based on the Bayesian boot- strap. The Bayesian bootstrap resolves a the- oretical problem with ordinary bagging and often results in more efficient estimators. We show how model averaging can be combined within the Bayesian bootstrap and illustrate the procedure with several examples.
Resumo:
Solder constitutive models are important as they are widely used in FEA simulations to predict the lifetime of soldered assemblies. This paper briefly reviews some common constitutive laws to capture creep in solder and presents work on laws capturing both kinematic hardening and damage. Inverse analysis is used to determine constants for the kinematic hardening law which match experimental creep curves. The mesh dependence of the damage law is overcome by using volume averaging and is applied to predict the crack path in a thermal cycled resistor component
Resumo:
The paper introduces a new modeling approach that represents the waiting times in an Accident and Emergency (A&E) Department in a UK based National Health Service (NHS) hospital. The technique uses Bayesian networks to capture the heterogeneity of arriving patients by representing how patient covariates interact to influence their waiting times in the department. Such waiting times have been reviewed by the NHS as a means of investigating the efficiency of A&E departments (Emergency Rooms) and how they operate. As a result activity targets are now established based on the patient total waiting times with much emphasis on trolley waits.
Resumo:
The paper introduces a new modeling approach that represents the waiting times in an accident and emergency (A&E) department in a UK based national health service (NHS) hospital. The technique uses Bayesian networks to capture the heterogeneity of arriving patients by representing how patient covariates interact to influence their waiting times in the department. Such waiting times have been reviewed by the NHS as a means of investigating the efficiency of A&E departments (emergency rooms) and how they operate. As a result activity targets are now established based on the patient total waiting times with much emphasis on trolley waits.
Resumo:
We propose the inverse Gaussian distribution, as a less complex alternative to the classical log-normal model, to describe turbulence-induced fading in free-space optical (FSO) systems operating in weak turbulence conditions and/or in the presence of aperture averaging effects. By conducting goodness of fit tests, we define the range of values of the scintillation index for various multiple-input multiple-output (MIMO) FSO configurations, where the two distributions approximate each other with a certain significance level. Furthermore, the bit error rate performance of two typical MIMO FSO systems is investigated over the new turbulence model; an intensity-modulation/direct detection MIMO FSO system with Q-ary pulse position modulation that employs repetition coding at the transmitter and equal gain combining at the receiver, and a heterodyne MIMO FSO system with differential phase-shift keying and maximal ratio combining at the receiver. Finally, numerical results are presented that validate the theoretical analysis and provide useful insights into the implications of the model parameters on the overall system performance. © 2011 IEEE.
Resumo:
Age-depth modeling using Bayesian statistics requires well-informed prior information about the behavior of sediment accumulation. Here we present average sediment accumulation rates (represented as deposition times, DT, in yr/cm) for lakes in an Arctic setting, and we examine the variability across space (intra- and inter-lake) and time (late Holocene). The dataset includes over 100 radiocarbon dates, primarily on bulk sediment, from 22 sediment cores obtained from 18 lakes spanning the boreal to tundra ecotone gradients in subarctic Canada. There are four to twenty-five radiocarbon dates per core, depending on the length and character of the sediment records. Deposition times were calculated at 100-year intervals from age-depth models constructed using the ‘classical’ age-depth modeling software Clam. Lakes in boreal settings have the most rapid accumulation (mean DT 20 ± 10 years), whereas lakes in tundra settings accumulate at moderate (mean DT 70 ± 10 years) to very slow rates, (>100 yr/cm). Many of the age-depth models demonstrate fluctuations in accumulation that coincide with lake evolution and post-glacial climate change. Ten of our sediment cores yielded sediments as old as c. 9,000 cal BP (BP = years before AD 1950). From between c. 9,000 cal BP and c. 6,000 cal BP, sediment accumulation was relatively rapid (DT of 20 to 60 yr/cm). Accumulation slowed between c. 5,500 and c. 4,000 cal BP as vegetation expanded northward in response to warming. A short period of rapid accumulation occurred near 1,200 cal BP at three lakes. Our research will help inform priors in Bayesian age modeling.
Resumo:
Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).
Resumo:
Common Loon (Gavia immer) is considered an emblematic and ecologically important example of aquatic-dependent wildlife in North America. The northern breeding range of Common Loon has contracted over the last century as a result of habitat degradation from human disturbance and lakeshore development. We focused on the state of New Hampshire, USA, where a long-term monitoring program conducted by the Loon Preservation Committee has been collecting biological data on Common Loon since 1976. The Common Loon population in New Hampshire is distributed throughout the state across a wide range of lake-specific habitats, water quality conditions, and levels of human disturbance. We used a multiscale approach to evaluate the association of Common Loon and breeding habitat within three natural physiographic ecoregions of New Hampshire. These multiple scales reflect Common Loon-specific extents such as territories, home ranges, and lake-landscape influences. We developed ecoregional multiscale models and compared them to single-scale models to evaluate model performance in distinguishing Common Loon breeding habitat. Based on information-theoretic criteria, there is empirical support for both multiscale and single-scale models across all three ecoregions, warranting a model-averaging approach. Our results suggest that the Common Loon responds to both ecological and anthropogenic factors at multiple scales when selecting breeding sites. These multiscale models can be used to identify and prioritize the conservation of preferred nesting habitat for Common Loon populations.
Resumo:
Bayesian decision procedures have already been proposed for and implemented in Phase I dose-escalation studies in healthy volunteers. The procedures have been based on pharmacokinetic responses reflecting the concentration of the drug in blood plasma and are conducted to learn about the dose-response relationship while avoiding excessive concentrations. However, in many dose-escalation studies, pharmacodynamic endpoints such as heart rate or blood pressure are observed, and it is these that should be used to control dose-escalation. These endpoints introduce additional complexity into the modeling of the problem relative to pharmacokinetic responses. Firstly, there are responses available following placebo administrations. Secondly, the pharmacodynamic responses are related directly to measurable plasma concentrations, which in turn are related to dose. Motivated by experience of data from a real study conducted in a conventional manner, this paper presents and evaluates a Bayesian procedure devised for the simultaneous monitoring of pharmacodynamic and pharmacokinetic responses. Account is also taken of the incidence of adverse events. Following logarithmic transformations, a linear model is used to relate dose to the pharmacokinetic endpoint and a quadratic model to relate the latter to the pharmacodynamic endpoint. A logistic model is used to relate the pharmacokinetic endpoint to the risk of an adverse event.
Resumo:
We investigate the impact of past climates on plant diversification by tracking the "footprint" of climate change on a phylogenetic tree. Diversity within the cosmopolitan carnivorous plant genus Drosera (Droseraceae) is focused within Mediterranean climate regions. We explore whether this diversity is temporally linked to Mediterranean-type climatic shifts of the mid-Miocene and whether climate preferences are conservative over phylogenetic timescales. Phyloclimatic modeling combines environmental niche (bioclimatic) modeling with phylogenetics in order to study evolutionary patterns in relation to climate change. We present the largest and most complete such example to date using Drosera. The bioclimatic models of extant species demonstrate clear phylogenetic patterns; this is particularly evident for the tuberous sundews from southwestern Australia (subgenus Ergaleium). We employ a method for establishing confidence intervals of node ages on a phylogeny using replicates from a Bayesian phylogenetic analysis. This chronogram shows that many clades, including subgenus Ergaleium and section Bryastrum, diversified during the establishment of the Mediterranean-type climate. Ancestral reconstructions of bioclimatic models demonstrate a pattern of preference for this climate type within these groups. Ancestral bioclimatic models are projected into palaeo-climate reconstructions for the time periods indicated by the chronogram. We present two such examples that each generate plausible estimates of ancestral lineage distribution, which are similar to their current distributions. This is the first study to attempt bioclimatic projections on evolutionary time scales. The sundews appear to have diversified in response to local climate development. Some groups are specialized for Mediterranean climates, others show wide-ranging generalism. This demonstrates that Phyloclimatic modeling could be repeated for other plant groups and is fundamental to the understanding of evolutionary responses to climate change.