885 resultados para Bayesian ridge regression


Relevância:

30.00% 30.00%

Publicador:

Resumo:

A total of 152,145 weekly test-day milk yield records from 7317 first lactations of Holstein cows distributed in 93 herds in southeastern Brazil were analyzed. Test-day milk yields were classified into 44 weekly classes of DIM. The contemporary groups were defined as herd-year-week of test-day. The model included direct additive genetic, permanent environmental and residual effects as random and fixed effects of contemporary group and age of cow at calving as covariable, linear and quadratic effects. Mean trends were modeled by a cubic regression on orthogonal polynomials of DIM. Additive genetic and permanent environmental random effects were estimated by random regression on orthogonal Legendre polynomials. Residual variances were modeled using third to seventh-order variance functions or a step function with 1, 6,13,17 and 44 variance classes. Results from Akaike`s and Schwarz`s Bayesian information criterion suggested that a model considering a 7th-order Legendre polynomial for additive effect, a 12th-order polynomial for permanent environment effect and a step function with 6 classes for residual variances, fitted best. However, a parsimonious model, with a 6th-order Legendre polynomial for additive effects and a 7th-order polynomial for permanent environmental effects, yielded very similar genetic parameter estimates. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A significant problem in the collection of responses to potentially sensitive questions, such as relating to illegal, immoral or embarrassing activities, is non-sampling error due to refusal to respond or false responses. Eichhorn & Hayre (1983) suggested the use of scrambled responses to reduce this form of bias. This paper considers a linear regression model in which the dependent variable is unobserved but for which the sum or product with a scrambling random variable of known distribution, is known. The performance of two likelihood-based estimators is investigated, namely of a Bayesian estimator achieved through a Markov chain Monte Carlo (MCMC) sampling scheme, and a classical maximum-likelihood estimator. These two estimators and an estimator suggested by Singh, Joarder & King (1996) are compared. Monte Carlo results show that the Bayesian estimator outperforms the classical estimators in almost all cases, and the relative performance of the Bayesian estimator improves as the responses become more scrambled.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dermatoglyphic measures are of interest to schizophrenia research because they serve as persistent markers of deviant development in foetal life. Several studies have reported alterations in A–B ridge counts, total finger ridge counts and measures related to asymmetry in schizophrenia. The aim of this study was to assess these measures in an Australian catchment area, case-control study. Individuals with psychosisŽns246.were drawn from a catchment-area prevalence study, and well controlsŽns229. were recruited from the same area. Finger and palm prints were taken usingan inkless technique and all dermatoglyphic measures were assessed by a trained rater blind to case status. The dermatoglyphic measures Žfinger ridge count, A–B ridge count, and their derived asymmetry measures. were divided into quartiles based on the distribution of these variables in controls. The main analysis Žlogistic regression controlled for age and sex.examined all psychotic disorders, with planned subgroup analyses comparing controls with Ž1. nonaffective psychosis Žschizophrenia, delusional disorder, schizophreniform psychosis, atypical psychosis.andŽ2. affective psychosis Ždepression with psychosis, bipolar disorder, schizoaffective psychosis.. There were no statistically significant alterations in the odds of havinga psychotic disorder for any of the dermatoglyphic measures. The results did not change when we examined affective and nonaffective psychosis separately. The dermatoglyphic features that distinguish schizophreniar psychosis in other studies were not identified in this Australian study. Regional variations in these findings may provide clues to differential ethnicrgenetic and environmental factors that are associated with schizophrenia. The Stanley Foundation supported this project.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

INTRODUCTION: Malaria is a serious problem in the Brazilian Amazon region, and the detection of possible risk factors could be of great interest for public health authorities. The objective of this article was to investigate the association between environmental variables and the yearly registers of malaria in the Amazon region using Bayesian spatiotemporal methods. METHODS: We used Poisson spatiotemporal regression models to analyze the Brazilian Amazon forest malaria count for the period from 1999 to 2008. In this study, we included some covariates that could be important in the yearly prediction of malaria, such as deforestation rate. We obtained the inferences using a Bayesian approach and Markov Chain Monte Carlo (MCMC) methods to simulate samples for the joint posterior distribution of interest. The discrimination of different models was also discussed. RESULTS: The model proposed here suggests that deforestation rate, the number of inhabitants per km², and the human development index (HDI) are important in the prediction of malaria cases. CONCLUSIONS: It is possible to conclude that human development, population growth, deforestation, and their associated ecological alterations are conducive to increasing malaria risk. We conclude that the use of Poisson regression models that capture the spatial and temporal effects under the Bayesian paradigm is a good strategy for modeling malaria counts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

INTRODUCTION: The purpose of this ecological study was to evaluate the urban spatial and temporal distribution of tuberculosis (TB) in Ribeirão Preto, State of São Paulo, southeast Brazil, between 2006 and 2009 and to evaluate its relationship with factors of social vulnerability such as income and education level. METHODS: We evaluated data from TBWeb, an electronic notification system for TB cases. Measures of social vulnerability were obtained from the SEADE Foundation, and information about the number of inhabitants, education and income of the households were obtained from Brazilian Institute of Geography and Statistics. Statistical analyses were conducted by a Bayesian regression model assuming a Poisson distribution for the observed new cases of TB in each area. A conditional autoregressive structure was used for the spatial covariance structure. RESULTS: The Bayesian model confirmed the spatial heterogeneity of TB distribution in Ribeirão Preto, identifying areas with elevated risk and the effects of social vulnerability on the disease. We demonstrated that the rate of TB was correlated with the measures of income, education and social vulnerability. However, we observed areas with low vulnerability and high education and income, but with high estimated TB rates. CONCLUSIONS: The study identified areas with different risks for TB, given that the public health system deals with the characteristics of each region individually and prioritizes those that present a higher propensity to risk of TB. Complex relationships may exist between TB incidence and a wide range of environmental and intrinsic factors, which need to be studied in future research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are both theoretical and empirical reasons for believing that the parameters of macroeconomic models may vary over time. However, work with time-varying parameter models has largely involved Vector autoregressions (VARs), ignoring cointegration. This is despite the fact that cointegration plays an important role in informing macroeconomists on a range of issues. In this paper we develop time varying parameter models which permit cointegration. Time-varying parameter VARs (TVP-VARs) typically use state space representations to model the evolution of parameters. In this paper, we show that it is not sensible to use straightforward extensions of TVP-VARs when allowing for cointegration. Instead we develop a specification which allows for the cointegrating space to evolve over time in a manner comparable to the random walk variation used with TVP-VARs. The properties of our approach are investigated before developing a method of posterior simulation. We use our methods in an empirical investigation involving a permanent/transitory variance decomposition for inflation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When actuaries face with the problem of pricing an insurance contract that contains different types of coverage, such as a motor insurance or homeowner's insurance policy, they usually assume that types of claim are independent. However, this assumption may not be realistic: several studies have shown that there is a positive correlation between types of claim. Here we introduce different regression models in order to relax the independence assumption, including zero-inflated models to account for excess of zeros and overdispersion. These models have been largely ignored to multivariate Poisson date, mainly because of their computational di±culties. Bayesian inference based on MCMC helps to solve this problem (and also lets us derive, for several quantities of interest, posterior summaries to account for uncertainty). Finally, these models are applied to an automobile insurance claims database with three different types of claims. We analyse the consequences for pure and loaded premiums when the independence assumption is relaxed by using different multivariate Poisson regression models and their zero-inflated versions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. Recent advances in machine learning offer a novel approach to model spatial distribution of petrophysical properties in complex reservoirs alternative to geostatistics. The approach is based of semisupervised learning, which handles both ?labelled? observed data and ?unlabelled? data, which have no measured value but describe prior knowledge and other relevant data in forms of manifolds in the input space where the modelled property is continuous. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic geological features and describe stochastic variability and non-uniqueness of spatial properties. On the other hand, it is able to capture and preserve key spatial dependencies such as connectivity of high permeability geo-bodies, which is often difficult in contemporary petroleum reservoir studies. Semi-supervised SVR as a data driven algorithm is designed to integrate various kind of conditioning information and learn dependences from it. The semi-supervised SVR model is able to balance signal/noise levels and control the prior belief in available data. In this work, stochastic semi-supervised SVR geomodel is integrated into Bayesian framework to quantify uncertainty of reservoir production with multiple models fitted to past dynamic observations (production history). Multiple history matched models are obtained using stochastic sampling and/or MCMC-based inference algorithms, which evaluate posterior probability distribution. Uncertainty of the model is described by posterior probability of the model parameters that represent key geological properties: spatial correlation size, continuity strength, smoothness/variability of spatial property distribution. The developed approach is illustrated with a fluvial reservoir case. The resulting probabilistic production forecasts are described by uncertainty envelopes. The paper compares the performance of the models with different combinations of unknown parameters and discusses sensitivity issues.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this paper is twofold. First, we study the determinants of economic growth among a wide set of potential variables for the Spanish provinces (NUTS3). Among others, we include various types of private, public and human capital in the group of growth factors. Also,we analyse whether Spanish provinces have converged in economic terms in recent decades. Thesecond objective is to obtain cross-section and panel data parameter estimates that are robustto model speci¯cation. For this purpose, we use a Bayesian Model Averaging (BMA) approach.Bayesian methodology constructs parameter estimates as a weighted average of linear regression estimates for every possible combination of included variables. The weight of each regression estimate is given by the posterior probability of each model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this paper is twofold. First, we study the determinants of economic growth among a wide set of potential variables for the Spanish provinces (NUTS3). Among others, we include various types of private, public and human capital in the group of growth factors. Also,we analyse whether Spanish provinces have converged in economic terms in recent decades. Thesecond objective is to obtain cross-section and panel data parameter estimates that are robustto model speci¯cation. For this purpose, we use a Bayesian Model Averaging (BMA) approach.Bayesian methodology constructs parameter estimates as a weighted average of linear regression estimates for every possible combination of included variables. The weight of each regression estimate is given by the posterior probability of each model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The relationship between inflammation and cancer is well established in several tumor types, including bladder cancer. We performed an association study between 886 inflammatory-gene variants and bladder cancer risk in 1,047 cases and 988 controls from the Spanish Bladder Cancer (SBC)/EPICURO Study. A preliminary exploration with the widely used univariate logistic regression approach did not identify any significant SNP after correcting for multiple testing. We further applied two more comprehensive methods to capture the complexity of bladder cancer genetic susceptibility: Bayesian Threshold LASSO (BTL), a regularized regression method, and AUC-Random Forest, a machine-learning algorithm. Both approaches explore the joint effect of markers. BTL analysis identified a signature of 37 SNPs in 34 genes showing an association with bladder cancer. AUC-RF detected an optimal predictive subset of 56 SNPs. 13 SNPs were identified by both methods in the total population. Using resources from the Texas Bladder Cancer study we were able to replicate 30% of the SNPs assessed. The associations between inflammatory SNPs and bladder cancer were reexamined among non-smokers to eliminate the effect of tobacco, one of the strongest and most prevalent environmental risk factor for this tumor. A 9 SNP-signature was detected by BTL. Here we report, for the first time, a set of SNP in inflammatory genes jointly associated with bladder cancer risk. These results highlight the importance of the complex structure of genetic susceptibility associated with cancer risk.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the past few decades, age estimation of living persons has represented a challenging task for many forensic services worldwide. In general, the process for age estimation includes the observation of the degree of maturity reached by some physical attributes, such as dentition or several ossification centers. The estimated chronological age or the probability that an individual belongs to a meaningful class of ages is then obtained from the observed degree of maturity by means of various statistical methods. Among these methods, those developed in a Bayesian framework offer to users the possibility of coherently dealing with the uncertainty associated with age estimation and of assessing in a transparent and logical way the probability that an examined individual is younger or older than a given age threshold. Recently, a Bayesian network for age estimation has been presented in scientific literature; this kind of probabilistic graphical tool may facilitate the use of the probabilistic approach. Probabilities of interest in the network are assigned by means of transition analysis, a statistical parametric model, which links the chronological age and the degree of maturity by means of specific regression models, such as logit or probit models. Since different regression models can be employed in transition analysis, the aim of this paper is to study the influence of the model in the classification of individuals. The analysis was performed using a dataset related to the ossifications status of the medial clavicular epiphysis and results support that the classification of individuals is not dependent on the choice of the regression model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, Bayesian decision procedures are developed for dose-escalation studies based on bivariate observations of undesirable events and signs of therapeutic benefit. The methods generalize earlier approaches taking into account only the undesirable outcomes. Logistic regression models are used to model the two responses, which are both assumed to take a binary form. A prior distribution for the unknown model parameters is suggested and an optional safety constraint can be included. Gain functions to be maximized are formulated in terms of accurate estimation of the limits of a therapeutic window or optimal treatment of the next cohort of subjects, although the approach could be applied to achieve any of a wide variety of objectives. The designs introduced are illustrated through simulation and retrospective implementation to a completed dose-escalation study. Copyright © 2006 John Wiley & Sons, Ltd.