921 resultados para Bayesian nonparametric
Resumo:
We present a novel approach to the inference of spectral functions from Euclidean time correlator data that makes close contact with modern Bayesian concepts. Our method differs significantly from the maximum entropy method (MEM). A new set of axioms is postulated for the prior probability, leading to an improved expression, which is devoid of the asymptotically flat directions present in the Shanon-Jaynes entropy. Hyperparameters are integrated out explicitly, liberating us from the Gaussian approximations underlying the evidence approach of the maximum entropy method. We present a realistic test of our method in the context of the nonperturbative extraction of the heavy quark potential. Based on hard-thermal-loop correlator mock data, we establish firm requirements in the number of data points and their accuracy for a successful extraction of the potential from lattice QCD. Finally we reinvestigate quenched lattice QCD correlators from a previous study and provide an improved potential estimation at T2.33TC.
Resumo:
This dissertation explores phase I dose-finding designs in cancer trials from three perspectives: the alternative Bayesian dose-escalation rules, a design based on a time-to-dose-limiting toxicity (DLT) model, and a design based on a discrete-time multi-state (DTMS) model. We list alternative Bayesian dose-escalation rules and perform a simulation study for the intra-rule and inter-rule comparisons based on two statistical models to identify the most appropriate rule under certain scenarios. We provide evidence that all the Bayesian rules outperform the traditional ``3+3'' design in the allocation of patients and selection of the maximum tolerated dose. The design based on a time-to-DLT model uses patients' DLT information over multiple treatment cycles in estimating the probability of DLT at the end of treatment cycle 1. Dose-escalation decisions are made whenever a cycle-1 DLT occurs, or two months after the previous check point. Compared to the design based on a logistic regression model, the new design shows more safety benefits for trials in which more late-onset toxicities are expected. As a trade-off, the new design requires more patients on average. The design based on a discrete-time multi-state (DTMS) model has three important attributes: (1) Toxicities are categorized over a distribution of severity levels, (2) Early toxicity may inform dose escalation, and (3) No suspension is required between accrual cohorts. The proposed model accounts for the difference in the importance of the toxicity severity levels and for transitions between toxicity levels. We compare the operating characteristics of the proposed design with those from a similar design based on a fully-evaluated model that directly models the maximum observed toxicity level within the patients' entire assessment window. We describe settings in which, under comparable power, the proposed design shortens the trial. The proposed design offers more benefit compared to the alternative design as patient accrual becomes slower.
Resumo:
In 2011, there will be an estimated 1,596,670 new cancer cases and 571,950 cancer-related deaths in the US. With the ever-increasing applications of cancer genetics in epidemiology, there is great potential to identify genetic risk factors that would help identify individuals with increased genetic susceptibility to cancer, which could be used to develop interventions or targeted therapies that could hopefully reduce cancer risk and mortality. In this dissertation, I propose to develop a new statistical method to evaluate the role of haplotypes in cancer susceptibility and development. This model will be flexible enough to handle not only haplotypes of any size, but also a variety of covariates. I will then apply this method to three cancer-related data sets (Hodgkin Disease, Glioma, and Lung Cancer). I hypothesize that there is substantial improvement in the estimation of association between haplotypes and disease, with the use of a Bayesian mathematical method to infer haplotypes that uses prior information from known genetics sources. Analysis based on haplotypes using information from publically available genetic sources generally show increased odds ratios and smaller p-values in both the Hodgkin, Glioma, and Lung data sets. For instance, the Bayesian Joint Logistic Model (BJLM) inferred haplotype TC had a substantially higher estimated effect size (OR=12.16, 95% CI = 2.47-90.1 vs. 9.24, 95% CI = 1.81-47.2) and more significant p-value (0.00044 vs. 0.008) for Hodgkin Disease compared to a traditional logistic regression approach. Also, the effect sizes of haplotypes modeled with recessive genetic effects were higher (and had more significant p-values) when analyzed with the BJLM. Full genetic models with haplotype information developed with the BJLM resulted in significantly higher discriminatory power and a significantly higher Net Reclassification Index compared to those developed with haplo.stats for lung cancer. Future analysis for this work could be to incorporate the 1000 Genomes project, which offers a larger selection of SNPs can be incorporated into the information from known genetic sources as well. Other future analysis include testing non-binary outcomes, like the levels of biomarkers that are present in lung cancer (NNK), and extending this analysis to full GWAS studies.
Resumo:
Previous research has shown that motion imagery draws on the same neural circuits that are involved in perception of motion, thus leading to a motion aftereffect (Winawer et al., 2010). Imagined stimuli can induce a similar shift in participants’ psychometric functions as neural adaptation due to a perceived stimulus. However, these studies have been criticized on the grounds that they fail to exclude the possibility that the subjects might have guessed the experimental hypothesis, and behaved accordingly (Morgan et al., 2012). In particular, the authors claim that participants can adopt arbitrary response criteria, which results in similar changes of the central tendency μ of psychometric curves as those shown by Winawer et al. (2010).
Resumo:
Most statistical analysis, theory and practice, is concerned with static models; models with a proposed set of parameters whose values are fixed across observational units. Static models implicitly assume that the quantified relationships remain the same across the design space of the data. While this is reasonable under many circumstances this can be a dangerous assumption when dealing with sequentially ordered data. The mere passage of time always brings fresh considerations and the interrelationships among parameters, or subsets of parameters, may need to be continually revised. ^ When data are gathered sequentially dynamic interim monitoring may be useful as new subject-specific parameters are introduced with each new observational unit. Sequential imputation via dynamic hierarchical models is an efficient strategy for handling missing data and analyzing longitudinal studies. Dynamic conditional independence models offers a flexible framework that exploits the Bayesian updating scheme for capturing the evolution of both the population and individual effects over time. While static models often describe aggregate information well they often do not reflect conflicts in the information at the individual level. Dynamic models prove advantageous over static models in capturing both individual and aggregate trends. Computations for such models can be carried out via the Gibbs sampler. An application using a small sample repeated measures normally distributed growth curve data is presented. ^
Resumo:
A non-parametric method was developed and tested to compare the partial areas under two correlated Receiver Operating Characteristic curves. Based on the theory of generalized U-statistics the mathematical formulas have been derived for computing ROC area, and the variance and covariance between the portions of two ROC curves. A practical SAS application also has been developed to facilitate the calculations. The accuracy of the non-parametric method was evaluated by comparing it to other methods. By applying our method to the data from a published ROC analysis of CT image, our results are very close to theirs. A hypothetical example was used to demonstrate the effects of two crossed ROC curves. The two ROC areas are the same. However each portion of the area between two ROC curves were found to be significantly different by the partial ROC curve analysis. For computation of ROC curves with large scales, such as a logistic regression model, we applied our method to the breast cancer study with Medicare claims data. It yielded the same ROC area computation as the SAS Logistic procedure. Our method also provides an alternative to the global summary of ROC area comparison by directly comparing the true-positive rates for two regression models and by determining the range of false-positive values where the models differ. ^
Resumo:
Ecosystems are faced with high rates of species loss which has consequences for their functions and services. To assess the effects of plant species diversity on the nitrogen (N) cycle, we developed a model for monthly mean nitrate (NO3-N) concentrations in soil solution in 0-30 cm mineral soil depth using plant species and functional group richness and functional composition as drivers and assessing the effects of conversion of arable land to grassland, spatially heterogeneous soil properties, and climate. We used monthly mean NO3-N concentrations from 62 plots of a grassland plant diversity experiment from 2003 to 2006. Plant species richness (1-60) and functional group composition (1-4 functional groups: legumes, grasses, non-leguminous tall herbs, non-leguminous small herbs) were manipulated in a factorial design. Plant community composition, time since conversion from arable land to grassland, soil texture, and climate data (precipitation, soil moisture, air and soil temperature) were used to develop one general Bayesian multiple regression model for the 62 plots to allow an in-depth evaluation using the experimental design. The model simulated NO3-N concentrations with an overall Bayesian coefficient of determination of 0.48. The temporal course of NO3-N concentrations was simulated differently well for the individual plots with a maximum plot-specific Nash-Sutcliffe Efficiency of 0.57. The model shows that NO3-N concentrations decrease with species richness, but this relation reverses if more than approx. 25 % of legume species are included in the mixture. Presence of legumes increases and presence of grasses decreases NO3-N concentrations compared to mixtures containing only small and tall herbs. Altogether, our model shows that there is a strong influence of plant community composition on NO3-N concentrations.