926 resultados para Approximate Bayesian Computation
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
The responses of carbon dioxide (CO2) and other climate variables to an emission pulse of CO2 into the atmosphere are often used to compute the Global Warming Potential (GWP) and Global Temperature change Potential (GTP), to characterize the response timescales of Earth System models, and to build reduced-form models. In this carbon cycle-climate model intercomparison project, which spans the full model hierarchy, we quantify responses to emission pulses of different magnitudes injected under different conditions. The CO2 response shows the known rapid decline in the first few decades followed by a millennium-scale tail. For a 100 Gt-C emission pulse added to a constant CO2 concentration of 389 ppm, 25 ± 9% is still found in the atmosphere after 1000 yr; the ocean has absorbed 59 ± 12% and the land the remainder (16 ± 14%). The response in global mean surface air temperature is an increase by 0.20 ± 0.12 °C within the first twenty years; thereafter and until year 1000, temperature decreases only slightly, whereas ocean heat content and sea level continue to rise. Our best estimate for the Absolute Global Warming Potential, given by the time-integrated response in CO2 at year 100 multiplied by its radiative efficiency, is 92.5 × 10−15 yr W m−2 per kg-CO2. This value very likely (5 to 95% confidence) lies within the range of (68 to 117) × 10−15 yr W m−2 per kg-CO2. Estimates for time-integrated response in CO2 published in the IPCC First, Second, and Fourth Assessment and our multi-model best estimate all agree within 15% during the first 100 yr. The integrated CO2 response, normalized by the pulse size, is lower for pre-industrial conditions, compared to present day, and lower for smaller pulses than larger pulses. In contrast, the response in temperature, sea level and ocean heat content is less sensitive to these choices. Although, choices in pulse size, background concentration, and model lead to uncertainties, the most important and subjective choice to determine AGWP of CO2 and GWP is the time horizon.
Resumo:
This dissertation explores phase I dose-finding designs in cancer trials from three perspectives: the alternative Bayesian dose-escalation rules, a design based on a time-to-dose-limiting toxicity (DLT) model, and a design based on a discrete-time multi-state (DTMS) model. We list alternative Bayesian dose-escalation rules and perform a simulation study for the intra-rule and inter-rule comparisons based on two statistical models to identify the most appropriate rule under certain scenarios. We provide evidence that all the Bayesian rules outperform the traditional ``3+3'' design in the allocation of patients and selection of the maximum tolerated dose. The design based on a time-to-DLT model uses patients' DLT information over multiple treatment cycles in estimating the probability of DLT at the end of treatment cycle 1. Dose-escalation decisions are made whenever a cycle-1 DLT occurs, or two months after the previous check point. Compared to the design based on a logistic regression model, the new design shows more safety benefits for trials in which more late-onset toxicities are expected. As a trade-off, the new design requires more patients on average. The design based on a discrete-time multi-state (DTMS) model has three important attributes: (1) Toxicities are categorized over a distribution of severity levels, (2) Early toxicity may inform dose escalation, and (3) No suspension is required between accrual cohorts. The proposed model accounts for the difference in the importance of the toxicity severity levels and for transitions between toxicity levels. We compare the operating characteristics of the proposed design with those from a similar design based on a fully-evaluated model that directly models the maximum observed toxicity level within the patients' entire assessment window. We describe settings in which, under comparable power, the proposed design shortens the trial. The proposed design offers more benefit compared to the alternative design as patient accrual becomes slower.
Resumo:
In 2011, there will be an estimated 1,596,670 new cancer cases and 571,950 cancer-related deaths in the US. With the ever-increasing applications of cancer genetics in epidemiology, there is great potential to identify genetic risk factors that would help identify individuals with increased genetic susceptibility to cancer, which could be used to develop interventions or targeted therapies that could hopefully reduce cancer risk and mortality. In this dissertation, I propose to develop a new statistical method to evaluate the role of haplotypes in cancer susceptibility and development. This model will be flexible enough to handle not only haplotypes of any size, but also a variety of covariates. I will then apply this method to three cancer-related data sets (Hodgkin Disease, Glioma, and Lung Cancer). I hypothesize that there is substantial improvement in the estimation of association between haplotypes and disease, with the use of a Bayesian mathematical method to infer haplotypes that uses prior information from known genetics sources. Analysis based on haplotypes using information from publically available genetic sources generally show increased odds ratios and smaller p-values in both the Hodgkin, Glioma, and Lung data sets. For instance, the Bayesian Joint Logistic Model (BJLM) inferred haplotype TC had a substantially higher estimated effect size (OR=12.16, 95% CI = 2.47-90.1 vs. 9.24, 95% CI = 1.81-47.2) and more significant p-value (0.00044 vs. 0.008) for Hodgkin Disease compared to a traditional logistic regression approach. Also, the effect sizes of haplotypes modeled with recessive genetic effects were higher (and had more significant p-values) when analyzed with the BJLM. Full genetic models with haplotype information developed with the BJLM resulted in significantly higher discriminatory power and a significantly higher Net Reclassification Index compared to those developed with haplo.stats for lung cancer. Future analysis for this work could be to incorporate the 1000 Genomes project, which offers a larger selection of SNPs can be incorporated into the information from known genetic sources as well. Other future analysis include testing non-binary outcomes, like the levels of biomarkers that are present in lung cancer (NNK), and extending this analysis to full GWAS studies.
Resumo:
Previous research has shown that motion imagery draws on the same neural circuits that are involved in perception of motion, thus leading to a motion aftereffect (Winawer et al., 2010). Imagined stimuli can induce a similar shift in participants’ psychometric functions as neural adaptation due to a perceived stimulus. However, these studies have been criticized on the grounds that they fail to exclude the possibility that the subjects might have guessed the experimental hypothesis, and behaved accordingly (Morgan et al., 2012). In particular, the authors claim that participants can adopt arbitrary response criteria, which results in similar changes of the central tendency μ of psychometric curves as those shown by Winawer et al. (2010).
Resumo:
Most statistical analysis, theory and practice, is concerned with static models; models with a proposed set of parameters whose values are fixed across observational units. Static models implicitly assume that the quantified relationships remain the same across the design space of the data. While this is reasonable under many circumstances this can be a dangerous assumption when dealing with sequentially ordered data. The mere passage of time always brings fresh considerations and the interrelationships among parameters, or subsets of parameters, may need to be continually revised. ^ When data are gathered sequentially dynamic interim monitoring may be useful as new subject-specific parameters are introduced with each new observational unit. Sequential imputation via dynamic hierarchical models is an efficient strategy for handling missing data and analyzing longitudinal studies. Dynamic conditional independence models offers a flexible framework that exploits the Bayesian updating scheme for capturing the evolution of both the population and individual effects over time. While static models often describe aggregate information well they often do not reflect conflicts in the information at the individual level. Dynamic models prove advantageous over static models in capturing both individual and aggregate trends. Computations for such models can be carried out via the Gibbs sampler. An application using a small sample repeated measures normally distributed growth curve data is presented. ^
Resumo:
Ecosystems are faced with high rates of species loss which has consequences for their functions and services. To assess the effects of plant species diversity on the nitrogen (N) cycle, we developed a model for monthly mean nitrate (NO3-N) concentrations in soil solution in 0-30 cm mineral soil depth using plant species and functional group richness and functional composition as drivers and assessing the effects of conversion of arable land to grassland, spatially heterogeneous soil properties, and climate. We used monthly mean NO3-N concentrations from 62 plots of a grassland plant diversity experiment from 2003 to 2006. Plant species richness (1-60) and functional group composition (1-4 functional groups: legumes, grasses, non-leguminous tall herbs, non-leguminous small herbs) were manipulated in a factorial design. Plant community composition, time since conversion from arable land to grassland, soil texture, and climate data (precipitation, soil moisture, air and soil temperature) were used to develop one general Bayesian multiple regression model for the 62 plots to allow an in-depth evaluation using the experimental design. The model simulated NO3-N concentrations with an overall Bayesian coefficient of determination of 0.48. The temporal course of NO3-N concentrations was simulated differently well for the individual plots with a maximum plot-specific Nash-Sutcliffe Efficiency of 0.57. The model shows that NO3-N concentrations decrease with species richness, but this relation reverses if more than approx. 25 % of legume species are included in the mixture. Presence of legumes increases and presence of grasses decreases NO3-N concentrations compared to mixtures containing only small and tall herbs. Altogether, our model shows that there is a strong influence of plant community composition on NO3-N concentrations.
Resumo:
There is great demand for easily-accessible, user-friendly dietary self-management applications. Yet accurate, fully-automatic estimation of nutritional intake using computer vision methods remains an open research problem. One key element of this problem is the volume estimation, which can be computed from 3D models obtained using multi-view geometry. The paper presents a computational system for volume estimation based on the processing of two meal images. A 3D model of the served meal is reconstructed using the acquired images and the volume is computed from the shape. The algorithm was tested on food models (dummy foods) with known volume and on real served food. Volume accuracy was in the order of 90 %, while the total execution time was below 15 seconds per image pair. The proposed system combines simple and computational affordable methods for 3D reconstruction, remained stable throughout the experiments, operates in near real time, and places minimum constraints on users.
Resumo:
We present a novel approach for the reconstruction of spectra from Euclidean correlator data that makes close contact to modern Bayesian concepts. It is based upon an axiomatically justified dimensionless prior distribution, which in the case of constant prior function m(ω) only imprints smoothness on the reconstructed spectrum. In addition we are able to analytically integrate out the only relevant overall hyper-parameter α in the prior, removing the necessity for Gaussian approximations found e.g. in the Maximum Entropy Method. Using a quasi-Newton minimizer and high-precision arithmetic, we are then able to find the unique global extremum of P[ρ|D] in the full Nω » Nτ dimensional search space. The method actually yields gradually improving reconstruction results if the quality of the supplied input data increases, without introducing artificial peak structures, often encountered in the MEM. To support these statements we present mock data analyses for the case of zero width delta peaks and more realistic scenarios, based on the perturbative Euclidean Wilson Loop as well as the Wilson Line correlator in Coulomb gauge.
Resumo:
The extraction of the finite temperature heavy quark potential from lattice QCD relies on a spectral analysis of the real-time Wilson loop. Through its position and shape, the lowest lying spectral peak encodes the real and imaginary part of this complex potential. We benchmark this extraction strategy using leading order hard-thermal loop (HTL) calculations. I.e. we analytically calculate the Wilson loop and determine the corresponding spectrum. By fitting its lowest lying peak we obtain the real- and imaginary part and confirm that the knowledge of the lowest peak alone is sufficient for obtaining the potential. We deploy a novel Bayesian approach to the reconstruction of spectral functions to HTL correlators in Euclidean time and observe how well the known spectral function and values for the real and imaginary part are reproduced. Finally we apply the method to quenched lattice QCD data and perform an improved estimate of both real and imaginary part of the non-perturbative heavy ǪǬ potential.
Resumo:
Pre-combined SLR-GNSS solutions are studied and the impact of different types of datum definition on the estimated parameters is assessed. It is found that the origin is realized best by using only the SLR core network for defining the geodetic datum and the inclusion of the GNSS core sites degrades the origin. The orientation, however, requires a dense and continuous network, thus, the inclusion of the GNSS core network is absolutely needed.
Resumo:
The direct Bayesian admissible region approach is an a priori state free measurement association and initial orbit determination technique for optical tracks. In this paper, we test a hybrid approach that appends a least squares estimator to the direct Bayesian method on measurements taken at the Zimmerwald Observatory of the Astronomical Institute at the University of Bern. Over half of the association pairs agreed with conventional geometric track correlation and least squares techniques. The remaining pairs cast light on the fundamental limits of conducting tracklet association based solely on dynamical and geometrical information.