887 resultados para SEMIPARAMETRIC REGRESSION-MODELS
Resumo:
Background: In addition to the oncogenic human papillomavirus (HPV), several cofactors are needed in cervical carcinogenesis, but whether the HPV covariates associated with incident i) CIN1 are different from those of incident ii) CIN2 and iii) CIN3 needs further assessment. Objectives: To gain further insights into the true biological differences between CIN1, CIN2 and CIN3, we assessed HPV covariates associated with incident CIN1, CIN2, and CIN3. Study Design and Methods: HPV covariates associated with progression to CIN1, CIN2 and CIN3 were analysed in the combined cohort of the NIS (n = 3,187) and LAMS study (n = 12,114), using competing-risks regression models (in panel data) for baseline HR-HPV-positive women (n = 1,105), who represent a sub-cohort of all 1,865 women prospectively followed-up in these two studies. Results: Altogether, 90 (4.8%), 39 (2.1%) and 14 (1.4%) cases progressed to CIN1, CIN2, and CIN3, respectively. Among these baseline HR-HPV-positive women, the risk profiles of incident GIN I, CIN2 and CIN3 were unique in that completely different HPV covariates were associated with progression to CIN1, CIN2 and CIN3, irrespective which categories (non-progression, CIN1, CIN2, CIN3 or all) were used as competing-risks events in univariate and multivariate models. Conclusions: These data confirm our previous analysis based on multinomial regression models implicating that distinct covariates of HR-HPV are associated with progression to CIN1, CIN2 and CIN3. This emphasises true biological differences between the three grades of GIN, which revisits the concept of combining CIN2 with CIN3 or with CIN1 in histological classification or used as a common end-point, e.g., in HPV vaccine trials.
Resumo:
The choice of an appropriate family of linear models for the analysis of longitudinal data is often a matter of concern for practitioners. To attenuate such difficulties, we discuss some issues that emerge when analyzing this type of data via a practical example involving pretestposttest longitudinal data. In particular, we consider log-normal linear mixed models (LNLMM), generalized linear mixed models (GLMM), and models based on generalized estimating equations (GEE). We show how some special features of the data, like a nonconstant coefficient of variation, may be handled in the three approaches and evaluate their performance with respect to the magnitude of standard errors of interpretable and comparable parameters. We also show how different diagnostic tools may be employed to identify outliers and comment on available software. We conclude by noting that the results are similar, but that GEE-based models may be preferable when the goal is to compare the marginal expected responses.
Resumo:
This paper proposes a general class of regression models for continuous proportions when the data contain zeros or ones. The proposed class of models assumes that the response variable has a mixed continuous-discrete distribution with probability mass at zero or one. The beta distribution is used to describe the continuous component of the model, since its density has a wide range of different shapes depending on the values of the two parameters that index the distribution. We use a suitable parameterization of the beta law in terms of its mean and a precision parameter. The parameters of the mixture distribution are modeled as functions of regression parameters. We provide inference, diagnostic, and model selection tools for this class of models. A practical application that employs real data is presented. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
The objective of this paper is to model variations in test-day milk yields of first lactations of Holstein cows by RR using B-spline functions and Bayesian inference in order to fit adequate and parsimonious models for the estimation of genetic parameters. They used 152,145 test day milk yield records from 7317 first lactations of Holstein cows. The model established in this study was additive, permanent environmental and residual random effects. In addition, contemporary group and linear and quadratic effects of the age of cow at calving were included as fixed effects. Authors modeled the average lactation curve of the population with a fourth-order orthogonal Legendre polynomial. They concluded that a cubic B-spline with seven random regression coefficients for both the additive genetic and permanent environment effects was to be the best according to residual mean square and residual variance estimates. Moreover they urged a lower order model (quadratic B-spline with seven random regression coefficients for both random effects) could be adopted because it yielded practically the same genetic parameter estimates with parsimony. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Suppose that we are interested in establishing simple, but reliable rules for predicting future t-year survivors via censored regression models. In this article, we present inference procedures for evaluating such binary classification rules based on various prediction precision measures quantified by the overall misclassification rate, sensitivity and specificity, and positive and negative predictive values. Specifically, under various working models we derive consistent estimators for the above measures via substitution and cross validation estimation procedures. Furthermore, we provide large sample approximations to the distributions of these nonsmooth estimators without assuming that the working model is correctly specified. Confidence intervals, for example, for the difference of the precision measures between two competing rules can then be constructed. All the proposals are illustrated with two real examples and their finite sample properties are evaluated via a simulation study.
Resumo:
In many clinical trials to evaluate treatment efficacy, it is believed that there may exist latent treatment effectiveness lag times after which medical procedure or chemical compound would be in full effect. In this article, semiparametric regression models are proposed and studied to estimate the treatment effect accounting for such latent lag times. The new models take advantage of the invariance property of the additive hazards model in marginalizing over random effects, so parameters in the models are easy to be estimated and interpreted, while the flexibility without specifying baseline hazard function is kept. Monte Carlo simulation studies demonstrate the appropriateness of the proposed semiparametric estimation procedure. Data collected in the actual randomized clinical trial, which evaluates the effectiveness of biodegradable carmustine polymers for treatment of recurrent brain tumors, are analyzed.
Resumo:
Latent class regression models are useful tools for assessing associations between covariates and latent variables. However, evaluation of key model assumptions cannot be performed using methods from standard regression models due to the unobserved nature of latent outcome variables. This paper presents graphical diagnostic tools to evaluate whether or not latent class regression models adhere to standard assumptions of the model: conditional independence and non-differential measurement. An integral part of these methods is the use of a Markov Chain Monte Carlo estimation procedure. Unlike standard maximum likelihood implementations for latent class regression model estimation, the MCMC approach allows us to calculate posterior distributions and point estimates of any functions of parameters. It is this convenience that allows us to provide the diagnostic methods that we introduce. As a motivating example we present an analysis focusing on the association between depression and socioeconomic status, using data from the Epidemiologic Catchment Area study. We consider a latent class regression analysis investigating the association between depression and socioeconomic status measures, where the latent variable depression is regressed on education and income indicators, in addition to age, gender, and marital status variables. While the fitted latent class regression model yields interesting results, the model parameters are found to be invalid due to the violation of model assumptions. The violation of these assumptions is clearly identified by the presented diagnostic plots. These methods can be applied to standard latent class and latent class regression models, and the general principle can be extended to evaluate model assumptions in other types of models.
Resumo:
OBJECTIVES: This paper is concerned with checking goodness-of-fit of binary logistic regression models. For the practitioners of data analysis, the broad classes of procedures for checking goodness-of-fit available in the literature are described. The challenges of model checking in the context of binary logistic regression are reviewed. As a viable solution, a simple graphical procedure for checking goodness-of-fit is proposed. METHODS: The graphical procedure proposed relies on pieces of information available from any logistic analysis; the focus is on combining and presenting these in an informative way. RESULTS: The information gained using this approach is presented with three examples. In the discussion, the proposed method is put into context and compared with other graphical procedures for checking goodness-of-fit of binary logistic models available in the literature. CONCLUSION: A simple graphical method can significantly improve the understanding of any logistic regression analysis and help to prevent faulty conclusions.
Resumo:
The counterfactual decomposition technique popularized by Blinder (1973, Journal of Human Resources, 436–455) and Oaxaca (1973, International Economic Review, 693–709) is widely used to study mean outcome differences between groups. For example, the technique is often used to analyze wage gaps by sex or race. This article summarizes the technique and addresses several complications, such as the identification of effects of categorical predictors in the detailed decomposition or the estimation of standard errors. A new command called oaxaca is introduced, and examples illustrating its usage are given.
Resumo:
When considering data from many trials, it is likely that some of them present a markedly different intervention effect or exert an undue influence on the summary results. We develop a forward search algorithm for identifying outlying and influential studies in meta-analysis models. The forward search algorithm starts by fitting the hypothesized model to a small subset of likely outlier-free studies and proceeds by adding studies into the set one-by-one that are determined to be closest to the fitted model of the existing set. As each study is added to the set, plots of estimated parameters and measures of fit are monitored to identify outliers by sharp changes in the forward plots. We apply the proposed outlier detection method to two real data sets; a meta-analysis of 26 studies that examines the effect of writing-to-learn interventions on academic achievement adjusting for three possible effect modifiers, and a meta-analysis of 70 studies that compares a fluoride toothpaste treatment to placebo for preventing dental caries in children. A simple simulated example is used to illustrate the steps of the proposed methodology, and a small-scale simulation study is conducted to evaluate the performance of the proposed method. Copyright © 2016 John Wiley & Sons, Ltd.
Resumo:
Consider a nonparametric regression model Y=mu*(X) + e, where the explanatory variables X are endogenous and e satisfies the conditional moment restriction E[e|W]=0 w.p.1 for instrumental variables W. It is well known that in these models the structural parameter mu* is 'ill-posed' in the sense that the function mapping the data to mu* is not continuous. In this paper, we derive the efficiency bounds for estimating linear functionals E[p(X)mu*(X)] and int_{supp(X)}p(x)mu*(x)dx, where p is a known weight function and supp(X) the support of X, without assuming mu* to be well-posed or even identified.
Resumo:
The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^