990 resultados para Bayesian semiparametric selection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper considers Bayesian variable selection in regressions with a large number of possibly highly correlated macroeconomic predictors. I show that by acknowledging the correlation structure in the predictors can improve forecasts over existing popular Bayesian variable selection algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this article, we introduce a semi-parametric Bayesian approach based on Dirichlet process priors for the discrete calibration problem in binomial regression models. An interesting topic is the dosimetry problem related to the dose-response model. A hierarchical formulation is provided so that a Markov chain Monte Carlo approach is developed. The methodology is applied to simulated and real data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Road accidents are a very relevant issue in many countries and macroeconomic models are very frequently applied by academia and administrations to reduce their frequency and consequences. The selection of explanatory variables and response transformation parameter within the Bayesian framework for the selection of the set of explanatory variables a TIM and 3IM (two input and three input models) procedures are proposed. The procedure also uses the DIC and pseudo -R2 goodness of fit criteria. The model to which the methodology is applied is a dynamic regression model with Box-Cox transformation (BCT) for the explanatory variables and autorgressive (AR) structure for the response. The initial set of 22 explanatory variables are identified. The effects of these factors on the fatal accident frequency in Spain, during 2000-2012, are estimated. The dependent variable is constructed considering the stochastic trend component.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study we are proposing a Bayesian model selection methodology, where the best model from the list of candidate structural explanatory models is selected. The model structure is based on the Zellner's (1971)explanatory model with autoregressive errors. For the selection technique we are using a parsimonious model, where the model variables are transformed using Box and Cox (1964) class of transformations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using the Bayesian approach as the model selection criteria, the main purpose in this study is to establish a practical road accident model that can provide a better interpretation and prediction performance. For this purpose we are using a structural explanatory model with autoregressive error term. The model estimation is carried out through Bayesian inference and the best model is selected based on the goodness of fit measures. To cross validate the model estimation further prediction analysis were done. As the road safety measures the number of fatal accidents in Spain, during 2000-2011 were employed. The results of the variable selection process show that the factors explaining fatal road accidents are mainly exposure, economic factors, and surveillance and legislative measures. The model selection shows that the impact of economic factors on fatal accidents during the period under study has been higher compared to surveillance and legislative measures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A practical Bayesian approach for inference in neural network models has been available for ten years, and yet it is not used frequently in medical applications. In this chapter we show how both regularisation and feature selection can bring significant benefits in diagnostic tasks through two case studies: heart arrhythmia classification based on ECG data and the prognosis of lupus. In the first of these, the number of variables was reduced by two thirds without significantly affecting performance, while in the second, only the Bayesian models had an acceptable accuracy. In both tasks, neural networks outperformed other pattern recognition approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nonlinear adjustment toward long-run price equilibrium relationships in the sugar-ethanol-oil nexus in Brazil is examined. We develop generalized bivariate error correction models that allow for cointegration between sugar, ethanol, and oil prices, where dynamic adjustments are potentially nonlinear functions of the disequilibrium errors. A range of models are estimated using Bayesian Monte Carlo Markov Chain algorithms and compared using Bayesian model selection methods. The results suggest that the long-run drivers of Brazilian sugar prices are oil prices and that there are nonlinearities in the adjustment processes of sugar and ethanol prices to oil price but linear adjustment between ethanol and sugar prices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work presents a Bayesian semiparametric approach for dealing with regression models where the covariate is measured with error. Given that (1) the error normality assumption is very restrictive, and (2) assuming a specific elliptical distribution for errors (Student-t for example), may be somewhat presumptuous; there is need for more flexible methods, in terms of assuming only symmetry of errors (admitting unknown kurtosis). In this sense, the main advantage of this extended Bayesian approach is the possibility of considering generalizations of the elliptical family of models by using Dirichlet process priors in dependent and independent situations. Conditional posterior distributions are implemented, allowing the use of Markov Chain Monte Carlo (MCMC), to generate the posterior distributions. An interesting result shown is that the Dirichlet process prior is not updated in the case of the dependent elliptical model. Furthermore, an analysis of a real data set is reported to illustrate the usefulness of our approach, in dealing with outliers. Finally, semiparametric proposed models and parametric normal model are compared, graphically with the posterior distribution density of the coefficients. (C) 2009 Elsevier Inc. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper develops stochastic search variable selection (SSVS) for zero-inflated count models which are commonly used in health economics. This allows for either model averaging or model selection in situations with many potential regressors. The proposed techniques are applied to a data set from Germany considering the demand for health care. A package for the free statistical software environment R is provided.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

There is a vast literature that specifies Bayesian shrinkage priors for vector autoregressions (VARs) of possibly large dimensions. In this paper I argue that many of these priors are not appropriate for multi-country settings, which motivates me to develop priors for panel VARs (PVARs). The parametric and semi-parametric priors I suggest not only perform valuable shrinkage in large dimensions, but also allow for soft clustering of variables or countries which are homogeneous. I discuss the implications of these new priors for modelling interdependencies and heterogeneities among different countries in a panel VAR setting. Monte Carlo evidence and an empirical forecasting exercise show clear and important gains of the new priors compared to existing popular priors for VARs and PVARs.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

When conducting a randomized comparative clinical trial, ethical, scientific or economic considerations often motivate the use of interim decision rules after successive groups of patients have been treated. These decisions may pertain to the comparative efficacy or safety of the treatments under study, cost considerations, the desire to accelerate the drug evaluation process, or the likelihood of therapeutic benefit for future patients. At the time of each interim decision, an important question is whether patient enrollment should continue or be terminated; either due to a high probability that one treatment is superior to the other, or a low probability that the experimental treatment will ultimately prove to be superior. The use of frequentist group sequential decision rules has become routine in the conduct of phase III clinical trials. In this dissertation, we will present a new Bayesian decision-theoretic approach to the problem of designing a randomized group sequential clinical trial, focusing on two-arm trials with time-to-failure outcomes. Forward simulation is used to obtain optimal decision boundaries for each of a set of possible models. At each interim analysis, we use Bayesian model selection to adaptively choose the model having the largest posterior probability of being correct, and we then make the interim decision based on the boundaries that are optimal under the chosen model. We provide a simulation study to compare this method, which we call Bayesian Doubly Optimal Group Sequential (BDOGS), to corresponding frequentist designs using either O'Brien-Fleming (OF) or Pocock boundaries, as obtained from EaSt 2000. Our simulation results show that, over a wide variety of different cases, BDOGS either performs at least as well as both OF and Pocock, or on average provides a much smaller trial. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cannabis sativa, the most widely used illicit drug, has profound effects on levels of anxiety in animals and humans. Although recent studies have helped provide a better understanding of the neurofunctional correlates of these effects, indicating the involvement of the amygdala and cingulate cortex, their reciprocal influence is still mostly unknown. In this study dynamic causal modelling (DCM) and Bayesian model selection (BMS) were used to explore the effects of pure compounds of C. sativa [600 mg of cannabidiol (CBD) and 10 mg Delta(9)-tetrahydrocannabinol (Delta(9)-THC)] on prefrontal-subcortical effective connectivity in 15 healthy subjects who underwent a double-blind randomized, placebo-controlled fMRI paradigm while viewing faces which elicited different levels of anxiety. In the placebo condition, BMS identified a model with driving inputs entering via the anterior cingulate and forward intrinsic connectivity between the amygdala and the anterior cingulate as the best fit. CBD but not Delta(9)-THC disrupted forward connectivity between these regions during the neural response to fearful faces. This is the first study to show that the disruption of prefrontal-subocrtical connectivity by CBD may represent neurophysiological correlates of its anxiolytic properties.