2 resultados para Deterministic imputation
em DigitalCommons@The Texas Medical Center
Resumo:
The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^
Resumo:
Objective: In this secondary data analysis, three statistical methodologies were implemented to handle cases with missing data in a motivational interviewing and feedback study. The aim was to evaluate the impact that these methodologies have on the data analysis. ^ Methods: We first evaluated whether the assumption of missing completely at random held for this study. We then proceeded to conduct a secondary data analysis using a mixed linear model to handle missing data with three methodologies (a) complete case analysis, (b) multiple imputation with explicit model containing outcome variables, time, and the interaction of time and treatment, and (c) multiple imputation with explicit model containing outcome variables, time, the interaction of time and treatment, and additional covariates (e.g., age, gender, smoke, years in school, marital status, housing, race/ethnicity, and if participants play on athletic team). Several comparisons were conducted including the following ones: 1) the motivation interviewing with feedback group (MIF) vs. the assessment only group (AO), the motivation interviewing group (MIO) vs. AO, and the intervention of the feedback only group (FBO) vs. AO, 2) MIF vs. FBO, and 3) MIF vs. MIO.^ Results: We first evaluated the patterns of missingness in this study, which indicated that about 13% of participants showed monotone missing patterns, and about 3.5% showed non-monotone missing patterns. Then we evaluated the assumption of missing completely at random by Little's missing completely at random (MCAR) test, in which the Chi-Square test statistic was 167.8 with 125 degrees of freedom, and its associated p-value was p=0.006, which indicated that the data could not be assumed to be missing completely at random. After that, we compared if the three different strategies reached the same results. For the comparison between MIF and AO as well as the comparison between MIF and FBO, only the multiple imputation with additional covariates by uncongenial and congenial models reached different results. For the comparison between MIF and MIO, all the methodologies for handling missing values obtained different results. ^ Discussions: The study indicated that, first, missingness was crucial in this study. Second, to understand the assumptions of the model was important since we could not identify if the data were missing at random or missing not at random. Therefore, future researches should focus on exploring more sensitivity analyses under missing not at random assumption.^