241 resultados para Naïve bayesian gaussian model
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is interest in studying latent variables. These latent variables are directly considered in the Item Response Models (IRM) and they are usually called latent traits. A usual assumption for parameter estimation of the IRM, considering one group of examinees, is to assume that the latent traits are random variables which follow a standard normal distribution. However, many works suggest that this assumption does not apply in many cases. Furthermore, when this assumption does not hold, the parameter estimates tend to be biased and misleading inference can be obtained. Therefore, it is important to model the distribution of the latent traits properly. In this paper we present an alternative latent traits modeling based on the so-called skew-normal distribution; see Genton (2004). We used the centred parameterization, which was proposed by Azzalini (1985). This approach ensures the model identifiability as pointed out by Azevedo et al. (2009b). Also, a Metropolis Hastings within Gibbs sampling (MHWGS) algorithm was built for parameter estimation by using an augmented data approach. A simulation study was performed in order to assess the parameter recovery in the proposed model and the estimation method, and the effect of the asymmetry level of the latent traits distribution on the parameter estimation. Also, a comparison of our approach with other estimation methods (which consider the assumption of symmetric normality for the latent traits distribution) was considered. The results indicated that our proposed algorithm recovers properly all parameters. Specifically, the greater the asymmetry level, the better the performance of our approach compared with other approaches, mainly in the presence of small sample sizes (number of examinees). Furthermore, we analyzed a real data set which presents indication of asymmetry concerning the latent traits distribution. The results obtained by using our approach confirmed the presence of strong negative asymmetry of the latent traits distribution. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Over the years, crop insurance programs became the focus of agricultural policy in the USA, Spain, Mexico, and more recently in Brazil. Given the increasing interest in insurance, accurate calculation of the premium rate is of great importance. We address the crop-yield distribution issue and its implications in pricing an insurance contract considering the dynamic structure of the data and incorporating the spatial correlation in the Hierarchical Bayesian framework. Results show that empirical (insurers) rates are higher in low risk areas and lower in high risk areas. Such methodological improvement is primarily important in situations of limited data.
Resumo:
This paper addresses the investment decisions considering the presence of financial constraints of 373 large Brazilian firms from 1997 to 2004, using panel data. A Bayesian econometric model was used considering ridge regression for multicollinearity problems among the variables in the model. Prior distributions are assumed for the parameters, classifying the model into random or fixed effects. We used a Bayesian approach to estimate the parameters, considering normal and Student t distributions for the error and assumed that the initial values for the lagged dependent variable are not fixed, but generated by a random process. The recursive predictive density criterion was used for model comparisons. Twenty models were tested and the results indicated that multicollinearity does influence the value of the estimated parameters. Controlling for capital intensity, financial constraints are found to be more important for capital-intensive firms, probably due to their lower profitability indexes, higher fixed costs and higher degree of property diversification.
Resumo:
Objective. The purpose of this study was to estimate the Down syndrome detection and false-positive rates for second-trimester sonographic prenasal thickness (PT) measurement alone and in combination with other markers. Methods. Multivariate log Gaussian modeling was performed using numerical integration. Parameters for the PT distribution, in multiples of the normal gestation-specific median (MoM), were derived from 105 Down syndrome and 1385 unaffected pregnancies scanned at 14 to 27 weeks. The data included a new series of 25 cases and 535 controls combined with 4 previously published series. The means were estimated by the median and the SDs by the 10th to 90th range divided by 2.563. Parameters for other markers were obtained from the literature. Results. A log Gaussian model fitted the distribution of PT values well in Down syndrome and unaffected pregnancies. The distribution parameters were as follows: Down syndrome, mean, 1.334 MoM; log(10) SD, 0.0772; unaffected pregnancies, 0.995 and 0.0752, respectively. The model-predicted detection rates for 1%, 3%, and 5% false-positive rates for PT alone were 35%, 51%, and 60%, respectively. The addition of PT to a 4 serum marker protocol increased detection by 14% to 18% compared with serum alone. The simultaneous sonographic measurement of PT and nasal bone length increased detection by 19% to 26%, and with a third sonographic marker, nuchal skin fold, performance was comparable with first-trimester protocols. Conclusions. Second-trimester screening with sonographic PT and serum markers is predicted to have a high detection rate, and further sonographic markers could perform comparably with first-trimester screening protocols.
Resumo:
In this paper, we compare the performance of two statistical approaches for the analysis of data obtained from the social research area. In the first approach, we use normal models with joint regression modelling for the mean and for the variance heterogeneity. In the second approach, we use hierarchical models. In the first case, individual and social variables are included in the regression modelling for the mean and for the variance, as explanatory variables, while in the second case, the variance at level 1 of the hierarchical model depends on the individuals (age of the individuals), and in the level 2 of the hierarchical model, the variance is assumed to change according to socioeconomic stratum. Applying these methodologies, we analyze a Colombian tallness data set to find differences that can be explained by socioeconomic conditions. We also present some theoretical and empirical results concerning the two models. From this comparative study, we conclude that it is better to jointly modelling the mean and variance heterogeneity in all cases. We also observe that the convergence of the Gibbs sampling chain used in the Markov Chain Monte Carlo method for the jointly modeling the mean and variance heterogeneity is quickly achieved.
Resumo:
This paper analyses the presence of financial constraint in the investment decisions of 367 Brazilian firms from 1997 to 2004, using a Bayesian econometric model with group-varying parameters. The motivation for this paper is the use of clustering techniques to group firms in a totally endogenous form. In order to classify the firms we used a hybrid clustering method, that is, hierarchical and non-hierarchical clustering techniques jointly. To estimate the parameters a Bayesian approach was considered. Prior distributions were assumed for the parameters, classifying the model in random or fixed effects. Ordinate predictive density criterion was used to select the model providing a better prediction. We tested thirty models and the better prediction considers the presence of 2 groups in the sample, assuming the fixed effect model with a Student t distribution with 20 degrees of freedom for the error. The results indicate robustness in the identification of financial constraint when the firms are classified by the clustering techniques. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
This work aims to compare different nonlinear functions for describing the growth curves of Nelore females. The growth curve parameters, their (co) variance components, and environmental and genetic effects were estimated jointly through a Bayesian hierarchical model. In the first stage of the hierarchy, 4 nonlinear functions were compared: Brody, Von Bertalanffy, Gompertz, and logistic. The analyses were carried out using 3 different data sets to check goodness of fit while having animals with few records. Three different assumptions about SD of fitting errors were considered: constancy throughout the trajectory, linear increasing until 3 yr of age and constancy thereafter, and variation following the nonlinear function applied in the first stage of the hierarchy. Comparisons of the overall goodness of fit were based on Akaike information criterion, the Bayesian information criterion, and the deviance information criterion. Goodness of fit at different points of the growth curve was compared applying the Gelfand`s check function. The posterior means of adult BW ranged from 531.78 to 586.89 kg. Greater estimates of adult BW were observed when the fitting error variance was considered constant along the trajectory. The models were not suitable to describe the SD of fitting errors at the beginning of the growth curve. All functions provided less accurate predictions at the beginning of growth, and predictions were more accurate after 48 mo of age. The prediction of adult BW using nonlinear functions can be accurate when growth curve parameters and their (co) variance components are estimated jointly. The hierarchical model used in the present study can be applied to the prediction of mature BW in herds in which a portion of the animals are culled before adult age. Gompertz, Von Bertalanffy, and Brody functions were adequate to establish mean growth patterns and to predict the adult BW of Nelore females. The Brody model was more accurate in predicting the birth weight of these animals and presented the best overall goodness of fit.
Resumo:
Joint generalized linear models and double generalized linear models (DGLMs) were designed to model outcomes for which the variability can be explained using factors and/or covariates. When such factors operate, the usual normal regression models, which inherently exhibit constant variance, will under-represent variation in the data and hence may lead to erroneous inferences. For count and proportion data, such noise factors can generate a so-called overdispersion effect, and the use of binomial and Poisson models underestimates the variability and, consequently, incorrectly indicate significant effects. In this manuscript, we propose a DGLM from a Bayesian perspective, focusing on the case of proportion data, where the overdispersion can be modeled using a random effect that depends on some noise factors. The posterior joint density function was sampled using Monte Carlo Markov Chain algorithms, allowing inferences over the model parameters. An application to a data set on apple tissue culture is presented, for which it is shown that the Bayesian approach is quite feasible, even when limited prior information is available, thereby generating valuable insight for the researcher about its experimental results.
Resumo:
Phylogenetic analyses of chloroplast DNA sequences, morphology, and combined data have provided consistent support for many of the major branches within the angiosperm, clade Dipsacales. Here we use sequences from three mitochondrial loci to test the existing broad scale phylogeny and in an attempt to resolve several relationships that have remained uncertain. Parsimony, maximum likelihood, and Bayesian analyses of a combined mitochondrial data set recover trees broadly consistent with previous studies, although resolution and support are lower than in the largest chloroplast analyses. Combining chloroplast and mitochondrial data results in a generally well-resolved and very strongly supported topology but the previously recognized problem areas remain. To investigate why these relationships have been difficult to resolve we conducted a series of experiments using different data partitions and heterogeneous substitution models. Usually more complex modeling schemes are favored regardless of the partitions recognized but model choice had little effect on topology or support values. In contrast there are consistent but weakly supported differences in the topologies recovered from coding and non-coding matrices. These conflicts directly correspond to relationships that were poorly resolved in analyses of the full combined chloroplast-mitochondrial data set. We suggest incongruent signal has contributed to our inability to confidently resolve these problem areas. (c) 2007 Elsevier Inc. All rights reserved.
Resumo:
It is known that patients may cease participating in a longitudinal study and become lost to follow-up. The objective of this article is to present a Bayesian model to estimate the malaria transition probabilities considering individuals lost to follow-up. We consider a homogeneous population, and it is assumed that the considered period of time is small enough to avoid two or more transitions from one state of health to another. The proposed model is based on a Gibbs sampling algorithm that uses information of lost to follow-up at the end of the longitudinal study. To simulate the unknown number of individuals with positive and negative states of malaria at the end of the study and lost to follow-up, two latent variables were introduced in the model. We used a real data set and a simulated data to illustrate the application of the methodology. The proposed model showed a good fit to these data sets, and the algorithm did not show problems of convergence or lack of identifiability. We conclude that the proposed model is a good alternative to estimate probabilities of transitions from one state of health to the other in studies with low adherence to follow-up.
Resumo:
In this paper we deal with a Bayesian analysis for right-censored survival data suitable for populations with a cure rate. We consider a cure rate model based on the negative binomial distribution, encompassing as a special case the promotion time cure model. Bayesian analysis is based on Markov chain Monte Carlo (MCMC) methods. We also present some discussion on model selection and an illustration with a real dataset.
Resumo:
The multivariate skew-t distribution (J Multivar Anal 79:93-113, 2001; J R Stat Soc, Ser B 65:367-389, 2003; Statistics 37:359-363, 2003) includes the Student t, skew-Cauchy and Cauchy distributions as special cases and the normal and skew-normal ones as limiting cases. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis of repeated measures, pretest/post-test data, under multivariate null intercept measurement error model (J Biopharm Stat 13(4):763-771, 2003) where the random errors and the unobserved value of the covariate (latent variable) follows a Student t and skew-t distribution, respectively. The results and methods are numerically illustrated with an example in the field of dentistry.
Resumo:
The main goal of this paper is to investigate a cure rate model that comprehends some well-known proposals found in the literature. In our work the number of competing causes of the event of interest follows the negative binomial distribution. The model is conveniently reparametrized through the cured fraction, which is then linked to covariates by means of the logistic link. We explore the use of Markov chain Monte Carlo methods to develop a Bayesian analysis in the proposed model. The procedure is illustrated with a numerical example.
Resumo:
Skew-normal distribution is a class of distributions that includes the normal distributions as a special case. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis in a multivariate, null intercept, measurement error model [R. Aoki, H. Bolfarine, J.A. Achcar, and D. Leao Pinto Jr, Bayesian analysis of a multivariate null intercept error-in -variables regression model, J. Biopharm. Stat. 13(4) (2003b), pp. 763-771] where the unobserved value of the covariate (latent variable) follows a skew-normal distribution. The results and methods are applied to a real dental clinical trial presented in [A. Hadgu and G. Koch, Application of generalized estimating equations to a dental randomized clinical trial, J. Biopharm. Stat. 9 (1999), pp. 161-178].
Resumo:
A Bayesian inference approach using Markov Chain Monte Carlo (MCMC) is developed for the logistic positive exponent (LPE) model proposed by Samejima and for a new skewed Logistic Item Response Theory (IRT) model, named Reflection LPE model. Both models lead to asymmetric item characteristic curves (ICC) and can be appropriate because a symmetric ICC treats both correct and incorrect answers symmetrically, which results in a logical contradiction in ordering examinees on the ability scale. A data set corresponding to a mathematical test applied in Peruvian public schools is analyzed, where comparisons with other parametric IRT models also are conducted. Several model comparison criteria are discussed and implemented. The main conclusion is that the LPE and RLPE IRT models are easy to implement and seem to provide the best fit to the data set considered.