29 resultados para Linear mixed models
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Linear mixed models were developed to handle clustered data and have been a topic of increasing interest in statistics for the past 50 years. Generally. the normality (or symmetry) of the random effects is a common assumption in linear mixed models but it may, sometimes, be unrealistic, obscuring important features of among-subjects variation. In this article, we utilize skew-normal/independent distributions as a tool for robust modeling of linear mixed models under a Bayesian paradigm. The skew-normal/independent distributions is an attractive class of asymmetric heavy-tailed distributions that includes the skew-normal distribution, skew-t, skew-slash and the skew-contaminated normal distributions as special cases, providing an appealing robust alternative to the routine use of symmetric distributions in this type of models. The methods developed are illustrated using a real data set from Framingham cholesterol study. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
We consider a generalized leverage matrix useful for the identification of influential units and observations in linear mixed models and show how a decomposition of this matrix may be employed to identify high leverage points for both the marginal fitted values and the random effect component of the conditional fitted values. We illustrate the different uses of the two components of the decomposition with a simulated example as well as with a real data set.
Resumo:
Although the asymptotic distributions of the likelihood ratio for testing hypotheses of null variance components in linear mixed models derived by Stram and Lee [1994. Variance components testing in longitudinal mixed effects model. Biometrics 50, 1171-1177] are valid, their proof is based on the work of Self and Liang [1987. Asymptotic properties of maximum likelihood estimators and likelihood tests under nonstandard conditions. J. Amer. Statist. Assoc. 82, 605-610] which requires identically distributed random variables, an assumption not always valid in longitudinal data problems. We use the less restrictive results of Vu and Zhou [1997. Generalization of likelihood ratio tests under nonstandard conditions. Ann. Statist. 25, 897-916] to prove that the proposed mixture of chi-squared distributions is the actual asymptotic distribution of such likelihood ratios used as test statistics for null variance components in models with one or two random effects. We also consider a limited simulation study to evaluate the appropriateness of the asymptotic distribution of such likelihood ratios in moderately sized samples. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
In this article, we consider local influence analysis for the skew-normal linear mixed model (SN-LMM). As the observed data log-likelihood associated with the SN-LMM is intractable, Cook`s well-known approach cannot be applied to obtain measures of local influence. Instead, we develop local influence measures following the approach of Zhu and Lee (2001). This approach is based on the use of an EM-type algorithm and is measurement invariant under reparametrizations. Four specific perturbation schemes are discussed. Results obtained for a simulated data set and a real data set are reported, illustrating the usefulness of the proposed methodology.
Resumo:
Mixed models may be defined with or without reference to sampling, and can be used to predict realized random effects, as when estimating the latent values of study subjects measured with response error. When the model is specified without reference to sampling, a simple mixed model includes two random variables, one stemming from an exchangeable distribution of latent values of study subjects and the other, from the study subjects` response error distributions. Positive probabilities are assigned to both potentially realizable responses and artificial responses that are not potentially realizable, resulting in artificial latent values. In contrast, finite population mixed models represent the two-stage process of sampling subjects and measuring their responses, where positive probabilities are only assigned to potentially realizable responses. A comparison of the estimators over the same potentially realizable responses indicates that the optimal linear mixed model estimator (the usual best linear unbiased predictor, BLUP) is often (but not always) more accurate than the comparable finite population mixed model estimator (the FPMM BLUP). We examine a simple example and provide the basis for a broader discussion of the role of conditioning, sampling, and model assumptions in developing inference.
Resumo:
Phylogenetic analyses of chloroplast DNA sequences, morphology, and combined data have provided consistent support for many of the major branches within the angiosperm, clade Dipsacales. Here we use sequences from three mitochondrial loci to test the existing broad scale phylogeny and in an attempt to resolve several relationships that have remained uncertain. Parsimony, maximum likelihood, and Bayesian analyses of a combined mitochondrial data set recover trees broadly consistent with previous studies, although resolution and support are lower than in the largest chloroplast analyses. Combining chloroplast and mitochondrial data results in a generally well-resolved and very strongly supported topology but the previously recognized problem areas remain. To investigate why these relationships have been difficult to resolve we conducted a series of experiments using different data partitions and heterogeneous substitution models. Usually more complex modeling schemes are favored regardless of the partitions recognized but model choice had little effect on topology or support values. In contrast there are consistent but weakly supported differences in the topologies recovered from coding and non-coding matrices. These conflicts directly correspond to relationships that were poorly resolved in analyses of the full combined chloroplast-mitochondrial data set. We suggest incongruent signal has contributed to our inability to confidently resolve these problem areas. (c) 2007 Elsevier Inc. All rights reserved.
Resumo:
The purpose of this paper is to develop a Bayesian analysis for nonlinear regression models under scale mixtures of skew-normal distributions. This novel class of models provides a useful generalization of the symmetrical nonlinear regression models since the error distributions cover both skewness and heavy-tailed distributions such as the skew-t, skew-slash and the skew-contaminated normal distributions. The main advantage of these class of distributions is that they have a nice hierarchical representation that allows the implementation of Markov chain Monte Carlo (MCMC) methods to simulate samples from the joint posterior distribution. In order to examine the robust aspects of this flexible class, against outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. Further, some discussions on the model selection criteria are given. The newly developed procedures are illustrated considering two simulations study, and a real data previously analyzed under normal and skew-normal nonlinear regression models. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The class of symmetric linear regression models has the normal linear regression model as a special case and includes several models that assume that the errors follow a symmetric distribution with longer-than-normal tails. An important member of this class is the t linear regression model, which is commonly used as an alternative to the usual normal regression model when the data contain extreme or outlying observations. In this article, we develop second-order asymptotic theory for score tests in this class of models. We obtain Bartlett-corrected score statistics for testing hypotheses on the regression and the dispersion parameters. The corrected statistics have chi-squared distributions with errors of order O(n(-3/2)), n being the sample size. The corrections represent an improvement over the corresponding original Rao`s score statistics, which are chi-squared distributed up to errors of order O(n(-1)). Simulation results show that the corrected score tests perform much better than their uncorrected counterparts in samples of small or moderate size.
Resumo:
The purpose of this article is to present a new method to predict the response variable of an observation in a new cluster for a multilevel logistic regression. The central idea is based on the empirical best estimator for the random effect. Two estimation methods for multilevel model are compared: penalized quasi-likelihood and Gauss-Hermite quadrature. The performance measures for the prediction of the probability for a new cluster observation of the multilevel logistic model in comparison with the usual logistic model are examined through simulations and an application.
Resumo:
In this paper we extend partial linear models with normal errors to Student-t errors Penalized likelihood equations are applied to derive the maximum likelihood estimates which appear to be robust against outlying observations in the sense of the Mahalanobis distance In order to study the sensitivity of the penalized estimates under some usual perturbation schemes in the model or data the local influence curvatures are derived and some diagnostic graphics are proposed A motivating example preliminary analyzed under normal errors is reanalyzed under Student-t errors The local influence approach is used to compare the sensitivity of the model estimates (C) 2010 Elsevier B V All rights reserved
Resumo:
Prediction of random effects is an important problem with expanding applications. In the simplest context, the problem corresponds to prediction of the latent value (the mean) of a realized cluster selected via two-stage sampling. Recently, Stanek and Singer [Predicting random effects from finite population clustered samples with response error. J. Amer. Statist. Assoc. 99, 119-130] developed best linear unbiased predictors (BLUP) under a finite population mixed model that outperform BLUPs from mixed models and superpopulation models. Their setup, however, does not allow for unequally sized clusters. To overcome this drawback, we consider an expanded finite population mixed model based on a larger set of random variables that span a higher dimensional space than those typically applied to such problems. We show that BLUPs for linear combinations of the realized cluster means derived under such a model have considerably smaller mean squared error (MSE) than those obtained from mixed models, superpopulation models, and finite population mixed models. We motivate our general approach by an example developed for two-stage cluster sampling and show that it faithfully captures the stochastic aspects of sampling in the problem. We also consider simulation studies to illustrate the increased accuracy of the BLUP obtained under the expanded finite population mixed model. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
In interval-censored survival data, the event of interest is not observed exactly but is only known to occur within some time interval. Such data appear very frequently. In this paper, we are concerned only with parametric forms, and so a location-scale regression model based on the exponentiated Weibull distribution is proposed for modeling interval-censored data. We show that the proposed log-exponentiated Weibull regression model for interval-censored data represents a parametric family of models that include other regression models that are broadly used in lifetime data analysis. Assuming the use of interval-censored data, we employ a frequentist analysis, a jackknife estimator, a parametric bootstrap and a Bayesian analysis for the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Furthermore, for different parameter settings, sample sizes and censoring percentages, various simulations are performed; in addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to a modified deviance residual in log-exponentiated Weibull regression models for interval-censored data. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
In this article, we compare three residuals based on the deviance component in generalised log-gamma regression models with censored observations. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. For all cases studied, the empirical distributions of the proposed residuals are in general symmetric around zero, but only a martingale-type residual presented negligible kurtosis for the majority of the cases studied. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for the martingale-type residual in generalised log-gamma regression models with censored data. A lifetime data set is analysed under log-gamma regression models and a model checking based on the martingale-type residual is performed.
Resumo:
Calculations of local influence curvatures and leverage have been well developed when the parameters are unrestricted. In this article, we discuss the assessment of local influence and leverage under linear equality parameter constraints with extensions to inequality constraints. Using a penalized quadratic function we express the normal curvature of local influence for arbitrary perturbation schemes and the generalized leverage matrix in interpretable forms, which depend on restricted and unrestricted components. The results are quite general and can be applied in various statistical models. In particular, we derive the normal curvature under three useful perturbation schemes for generalized linear models. Four illustrative examples are analyzed by the methodology developed in the article.
Resumo:
We consider the issue of assessing influence of observations in the class of Birnbaum-Saunders nonlinear regression models, which is useful in lifetime data analysis. Our results generalize those in Galea et al. [8] which are confined to Birnbaum-Saunders linear regression models. Some influence methods, such as the local influence, total local influence of an individual and generalized leverage are discussed. Additionally, the normal curvatures for studying local influence are derived under some perturbation schemes. We also give an application to a real fatigue data set.