942 resultados para hierarchical Bayesian models
Resumo:
Integrated choice and latent variable (ICLV) models represent a promising new class of models which merge classic choice models with the structural equation approach (SEM) for latent variables. Despite their conceptual appeal, applications of ICLV models in marketing remain rare. We extend previous ICLV applications by first estimating a multinomial choice model and, second, by estimating hierarchical relations between latent variables. An empirical study on travel mode choice clearly demonstrates the value of ICLV models to enhance the understanding of choice processes. In addition to the usually studied directly observable variables such as travel time, we show how abstract motivations such as power and hedonism as well as attitudes such as a desire for flexibility impact on travel mode choice. Furthermore, we show that it is possible to estimate such a complex ICLV model with the widely available structural equation modeling package Mplus. This finding is likely to encourage more widespread application of this appealing model class in the marketing field.
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
This dissertation explores phase I dose-finding designs in cancer trials from three perspectives: the alternative Bayesian dose-escalation rules, a design based on a time-to-dose-limiting toxicity (DLT) model, and a design based on a discrete-time multi-state (DTMS) model. We list alternative Bayesian dose-escalation rules and perform a simulation study for the intra-rule and inter-rule comparisons based on two statistical models to identify the most appropriate rule under certain scenarios. We provide evidence that all the Bayesian rules outperform the traditional ``3+3'' design in the allocation of patients and selection of the maximum tolerated dose. The design based on a time-to-DLT model uses patients' DLT information over multiple treatment cycles in estimating the probability of DLT at the end of treatment cycle 1. Dose-escalation decisions are made whenever a cycle-1 DLT occurs, or two months after the previous check point. Compared to the design based on a logistic regression model, the new design shows more safety benefits for trials in which more late-onset toxicities are expected. As a trade-off, the new design requires more patients on average. The design based on a discrete-time multi-state (DTMS) model has three important attributes: (1) Toxicities are categorized over a distribution of severity levels, (2) Early toxicity may inform dose escalation, and (3) No suspension is required between accrual cohorts. The proposed model accounts for the difference in the importance of the toxicity severity levels and for transitions between toxicity levels. We compare the operating characteristics of the proposed design with those from a similar design based on a fully-evaluated model that directly models the maximum observed toxicity level within the patients' entire assessment window. We describe settings in which, under comparable power, the proposed design shortens the trial. The proposed design offers more benefit compared to the alternative design as patient accrual becomes slower.
Resumo:
In 2011, there will be an estimated 1,596,670 new cancer cases and 571,950 cancer-related deaths in the US. With the ever-increasing applications of cancer genetics in epidemiology, there is great potential to identify genetic risk factors that would help identify individuals with increased genetic susceptibility to cancer, which could be used to develop interventions or targeted therapies that could hopefully reduce cancer risk and mortality. In this dissertation, I propose to develop a new statistical method to evaluate the role of haplotypes in cancer susceptibility and development. This model will be flexible enough to handle not only haplotypes of any size, but also a variety of covariates. I will then apply this method to three cancer-related data sets (Hodgkin Disease, Glioma, and Lung Cancer). I hypothesize that there is substantial improvement in the estimation of association between haplotypes and disease, with the use of a Bayesian mathematical method to infer haplotypes that uses prior information from known genetics sources. Analysis based on haplotypes using information from publically available genetic sources generally show increased odds ratios and smaller p-values in both the Hodgkin, Glioma, and Lung data sets. For instance, the Bayesian Joint Logistic Model (BJLM) inferred haplotype TC had a substantially higher estimated effect size (OR=12.16, 95% CI = 2.47-90.1 vs. 9.24, 95% CI = 1.81-47.2) and more significant p-value (0.00044 vs. 0.008) for Hodgkin Disease compared to a traditional logistic regression approach. Also, the effect sizes of haplotypes modeled with recessive genetic effects were higher (and had more significant p-values) when analyzed with the BJLM. Full genetic models with haplotype information developed with the BJLM resulted in significantly higher discriminatory power and a significantly higher Net Reclassification Index compared to those developed with haplo.stats for lung cancer. Future analysis for this work could be to incorporate the 1000 Genomes project, which offers a larger selection of SNPs can be incorporated into the information from known genetic sources as well. Other future analysis include testing non-binary outcomes, like the levels of biomarkers that are present in lung cancer (NNK), and extending this analysis to full GWAS studies.
Resumo:
How do probabilistic models represent their targets and how do they allow us to learn about them? The answer to this question depends on a number of details, in particular on the meaning of the probabilities involved. To classify the options, a minimalist conception of representation (Su\'arez 2004) is adopted: Modelers devise substitutes (``sources'') of their targets and investigate them to infer something about the target. Probabilistic models allow us to infer probabilities about the target from probabilities about the source. This leads to a framework in which we can systematically distinguish between different models of probabilistic modeling. I develop a fully Bayesian view of probabilistic modeling, but I argue that, as an alternative, Bayesian degrees of belief about the target may be derived from ontic probabilities about the source. Remarkably, some accounts of ontic probabilities can avoid problems if they are supposed to apply to sources only.
Resumo:
In numerous intervention studies and education field trials, random assignment to treatment occurs in clusters rather than at the level of observation. This departure of random assignment of units may be due to logistics, political feasibility, or ecological validity. Data within the same cluster or grouping are often correlated. Application of traditional regression techniques, which assume independence between observations, to clustered data produce consistent parameter estimates. However such estimators are often inefficient as compared to methods which incorporate the clustered nature of the data into the estimation procedure (Neuhaus 1993).1 Multilevel models, also known as random effects or random components models, can be used to account for the clustering of data by estimating higher level, or group, as well as lower level, or individual variation. Designing a study, in which the unit of observation is nested within higher level groupings, requires the determination of sample sizes at each level. This study investigates the design and analysis of various sampling strategies for a 3-level repeated measures design on the parameter estimates when the outcome variable of interest follows a Poisson distribution. ^ Results study suggest that second order PQL estimation produces the least biased estimates in the 3-level multilevel Poisson model followed by first order PQL and then second and first order MQL. The MQL estimates of both fixed and random parameters are generally satisfactory when the level 2 and level 3 variation is less than 0.10. However, as the higher level error variance increases, the MQL estimates become increasingly biased. If convergence of the estimation algorithm is not obtained by PQL procedure and higher level error variance is large, the estimates may be significantly biased. In this case bias correction techniques such as bootstrapping should be considered as an alternative procedure. For larger sample sizes, those structures with 20 or more units sampled at levels with normally distributed random errors produced more stable estimates with less sampling variance than structures with an increased number of level 1 units. For small sample sizes, sampling fewer units at the level with Poisson variation produces less sampling variation, however this criterion is no longer important when sample sizes are large. ^ 1Neuhaus J (1993). “Estimation efficiency and Tests of Covariate Effects with Clustered Binary Data”. Biometrics , 49, 989–996^
Resumo:
Seizure freedom in patients suffering from pharmacoresistant epilepsies is still not achieved in 20–30% of all cases. Hence, current therapies need to be improved, based on a more complete understanding of ictogenesis. In this respect, the analysis of functional networks derived from intracranial electroencephalographic (iEEG) data has recently become a standard tool. Functional networks however are purely descriptive models and thus are conceptually unable to predict fundamental features of iEEG time-series, e.g., in the context of therapeutical brain stimulation. In this paper we present some first steps towards overcoming the limitations of functional network analysis, by showing that its results are implied by a simple predictive model of time-sliced iEEG time-series. More specifically, we learn distinct graphical models (so called Chow–Liu (CL) trees) as models for the spatial dependencies between iEEG signals. Bayesian inference is then applied to the CL trees, allowing for an analytic derivation/prediction of functional networks, based on thresholding of the absolute value Pearson correlation coefficient (CC) matrix. Using various measures, the thus obtained networks are then compared to those which were derived in the classical way from the empirical CC-matrix. In the high threshold limit we find (a) an excellent agreement between the two networks and (b) key features of periictal networks as they have previously been reported in the literature. Apart from functional networks, both matrices are also compared element-wise, showing that the CL approach leads to a sparse representation, by setting small correlations to values close to zero while preserving the larger ones. Overall, this paper shows the validity of CL-trees as simple, spatially predictive models for periictal iEEG data. Moreover, we suggest straightforward generalizations of the CL-approach for modeling also the temporal features of iEEG signals.
Resumo:
The spatial context is critical when assessing present-day climate anomalies, attributing them to potential forcings and making statements regarding their frequency and severity in a long-term perspective. Recent international initiatives have expanded the number of high-quality proxy-records and developed new statistical reconstruction methods. These advances allow more rigorous regional past temperature reconstructions and, in turn, the possibility of evaluating climate models on policy-relevant, spatio-temporal scales. Here we provide a new proxy-based, annually-resolved, spatial reconstruction of the European summer (June–August) temperature fields back to 755 CE based on Bayesian hierarchical modelling (BHM), together with estimates of the European mean temperature variation since 138 BCE based on BHM and composite-plus-scaling (CPS). Our reconstructions compare well with independent instrumental and proxy-based temperature estimates, but suggest a larger amplitude in summer temperature variability than previously reported. Both CPS and BHM reconstructions indicate that the mean 20th century European summer temperature was not significantly different from some earlier centuries, including the 1st, 2nd, 8th and 10th centuries CE. The 1st century (in BHM also the 10th century) may even have been slightly warmer than the 20th century, but the difference is not statistically significant. Comparing each 50 yr period with the 1951–2000 period reveals a similar pattern. Recent summers, however, have been unusually warm in the context of the last two millennia and there are no 30 yr periods in either reconstruction that exceed the mean average European summer temperature of the last 3 decades (1986–2015 CE). A comparison with an ensemble of climate model simulations suggests that the reconstructed European summer temperature variability over the period 850–2000 CE reflects changes in both internal variability and external forcing on multi-decadal time-scales. For pan-European temperatures we find slightly better agreement between the reconstruction and the model simulations with high-end estimates for total solar irradiance. Temperature differences between the medieval period, the recent period and the Little Ice Age are larger in the reconstructions than the simulations. This may indicate inflated variability of the reconstructions, a lack of sensitivity and processes to changes in external forcing on the simulated European climate and/or an underestimation of internal variability on centennial and longer time scales.
Resumo:
Vestibular cognition has recently gained attention. Despite numerous experimental and clinical demonstrations, it is not yet clear what vestibular cognition really is. For future research in vestibular cognition, adopting a computational approach will make it easier to explore the underlying mech- anisms. Indeed, most modeling approaches in vestibular science include a top-down or a priori component. We review recent Bayesian optimal observer models, and discuss in detail the conceptual value of prior assumptions, likelihood and posterior estimates for research in vestibular cognition. We then consider forward models in vestibular processing, which are required in order to distinguish between sensory input that is induced by active self-motion, and sensory input that is due to passive self-motion. We suggest that forward models are used not only in the service of estimating sensory states but they can also be drawn upon in an offline mode (e.g., spatial perspective transformations), in which interaction with sensory input is not desired. A computational approach to vestibular cogni- tion will help to discover connections across studies, and it will provide a more coherent framework for investigating vestibular cognition.
Resumo:
Mental imagery and perception are thought to rely on similar neural circuits, and many recent behavioral studies have attempted to demonstrate interactions between actual physical stimulation and sensory imagery in the corresponding sensory modality. However, there has been a lack of theoretical understanding of the nature of these interactions, and both interferential and facilitatory effects have been found. Facilitatory effects appear strikingly similar to those that arise due to experimental manipulations of expectation. Using a self-motion discrimination task, we try to disentangle the effects of mental imagery from those of expectation by using a hierarchical drift diffusion model to investigate both choice data and response times. Manipulations of expectation are reasonably well understood in terms of their selective influence on parameters of the drift diffusion model, and in this study, we make the first attempt to similarly characterize the effects of mental imagery. We investigate mental imagery within the computational framework of control theory and state estimation. • Mental imagery and perception are thought to rely on similar neural circuits; however, on more theoretical grounds, imagery seems to be closely related to the output of forward models (sensory predictions). • We reanalyzed data from a study of imagined self-motion. • Bayesian modeling of response times may allow us to disentangle the effects of mental imagery on behavior from other cognitive (top-down) effects, such as expectation.
Resumo:
Dua and Miller (1996) created leading and coincident employment indexes for the state of Connecticut, following Moore's (1981) work at the national level. The performance of the Dua-Miller indexes following the recession of the early 1990s fell short of expectations. This paper performs two tasks. First, it describes the process of revising the Connecticut Coincident and Leading Employment Indexes. Second, it analyzes the statistical properties and performance of the new indexes by comparing the lead profiles of the new and old indexes as well as their out-of-sample forecasting performance, using the Bayesian Vector Autoregressive (BVAR) method. The new indexes show improved performance in dating employment cycle chronologies. The lead profile test demonstrates that superiority in a rigorous, non-parametric statistic fashion. The mixed evidence on the BVAR forecasting experiments illustrates the truth in the Granger and Newbold (1986) caution that leading indexes properly predict cycle turning points and do not necessarily provide accurate forecasts except at turning points, a view that our results support.
Resumo:
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short edge lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.
Resumo:
We present a framework for fitting multiple random walks to animal movement paths consisting of ordered sets of step lengths and turning angles. Each step and turn is assigned to one of a number of random walks, each characteristic of a different behavioral state. Behavioral state assignments may be inferred purely from movement data or may include the habitat type in which the animals are located. Switching between different behavioral states may be modeled explicitly using a state transition matrix estimated directly from data, or switching probabilities may take into account the proximity of animals to landscape features. Model fitting is undertaken within a Bayesian framework using the WinBUGS software. These methods allow for identification of different movement states using several properties of observed paths and lead naturally to the formulation of movement models. Analysis of relocation data from elk released in east-central Ontario, Canada, suggests a biphasic movement behavior: elk are either in an "encamped" state in which step lengths are small and turning angles are high, or in an "exploratory" state, in which daily step lengths are several kilometers and turning angles are small. Animals encamp in open habitat (agricultural fields and opened forest), but the exploratory state is not associated with any particular habitat type.
Resumo:
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short edge lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.
Resumo:
This paper uses Bayesian vector autoregressive models to examine the usefulness of leading indicators in predicting US home sales. The benchmark Bayesian model includes home sales, the price of homes, the mortgage rate, real personal disposable income, and the unemployment rate. We evaluate the forecasting performance of six alternative leading indicators by adding each, in turn, to the benchmark model. Out-of-sample forecast performance over three periods shows that the model that includes building permits authorized consistently produces the most accurate forecasts. Thus, the intention to build in the future provides good information with which to predict home sales. Another finding suggests that leading indicators with longer leads outperform the short-leading indicators.