6 resultados para Missing values
em DigitalCommons@The Texas Medical Center
Resumo:
Objective: In this secondary data analysis, three statistical methodologies were implemented to handle cases with missing data in a motivational interviewing and feedback study. The aim was to evaluate the impact that these methodologies have on the data analysis. ^ Methods: We first evaluated whether the assumption of missing completely at random held for this study. We then proceeded to conduct a secondary data analysis using a mixed linear model to handle missing data with three methodologies (a) complete case analysis, (b) multiple imputation with explicit model containing outcome variables, time, and the interaction of time and treatment, and (c) multiple imputation with explicit model containing outcome variables, time, the interaction of time and treatment, and additional covariates (e.g., age, gender, smoke, years in school, marital status, housing, race/ethnicity, and if participants play on athletic team). Several comparisons were conducted including the following ones: 1) the motivation interviewing with feedback group (MIF) vs. the assessment only group (AO), the motivation interviewing group (MIO) vs. AO, and the intervention of the feedback only group (FBO) vs. AO, 2) MIF vs. FBO, and 3) MIF vs. MIO.^ Results: We first evaluated the patterns of missingness in this study, which indicated that about 13% of participants showed monotone missing patterns, and about 3.5% showed non-monotone missing patterns. Then we evaluated the assumption of missing completely at random by Little's missing completely at random (MCAR) test, in which the Chi-Square test statistic was 167.8 with 125 degrees of freedom, and its associated p-value was p=0.006, which indicated that the data could not be assumed to be missing completely at random. After that, we compared if the three different strategies reached the same results. For the comparison between MIF and AO as well as the comparison between MIF and FBO, only the multiple imputation with additional covariates by uncongenial and congenial models reached different results. For the comparison between MIF and MIO, all the methodologies for handling missing values obtained different results. ^ Discussions: The study indicated that, first, missingness was crucial in this study. Second, to understand the assumptions of the model was important since we could not identify if the data were missing at random or missing not at random. Therefore, future researches should focus on exploring more sensitivity analyses under missing not at random assumption.^
Resumo:
The purposes of this study were to examine (1) the relationship between selected components of the content of prenatal care and spontaneous preterm birth; and (2) the degree of comparability between maternal and caregivers' responses regarding the number of prenatal care visits, selected components of the content of prenatal care, and gestational age, based on analyses of the 1988 National Maternal and Infant Health Survey conducted by the National Centers for Health Statistics. Spontaneous preterm birth was subcategorized into very preterm and moderately preterm births, with term birth as the controls. The study population was limited to non-Hispanic Anglo- and African-American mothers. The racial differences in terms of birth outcomes were also compared.^ This study concluded that: (1) there was not a high degree of comparability (less than 80%) between maternal and prenatal care provider's responses regarding the number of prenatal care visits and the content of prenatal care; (2) there was a low degree of comparability (less than 50%) between maternal and infant's hospital of delivery responses regarding gestational age at birth; (3) there were differences in selected components of the content of prenatal care between the cases and controls, overall and stratified by ethnicity (i.e., hemoglobin/hematocrit test, weight measurement, and breast-feeding counseling), but they were confounded with missing values and associated preterm delivery bias; (4) there were differences in selected components of the content of prenatal care between Anglo- and African-American cases (i.e., vitamin/mineral supplement advice, weight measurement, smoking cessation and drug abuse counseling), but they, too, were difficult to interpret definitively due to item nonresponse and preterm delivery biases; (5) no significant predictive association between selected components of the content of prenatal care and spontaneous preterm birth was found; and (6) inadequate/intermediate prenatal care and birth out of wedlock were found to be associated with moderately preterm birth.^ Future research is needed to examine the validity of maternal and prenatal care providers' responses and identify the sources of disagreement between their responses. In addition, further studies are needed to examine the relationship between the quality of prenatal care and preterm birth. Finally, the completeness and quality of patient and provider data on the utilization and content of prenatal care needs to be strengthened in subsequent studies. ^
Resumo:
Ethnic violence appears to be the major source of violence in the world. Ethnic hostilities are potentially all-pervasive because most countries in the world are multi-ethnic. Public health's focus on violence documents its increasing role in this issue.^ The present study is based on a secondary analysis of a dataset of responses by 272 individuals from four ethnic groups (Anglo, African, Mexican, and Vietnamese Americans) who answered questions regarding variables related to ethnic violence from a general questionnaire which was distributed to ethnically diverse purposive, nonprobability, self-selected groups of individuals in Houston, Texas, in 1993.^ One goal was psychometric: learning about issues in analysis of datasets with modest numbers, comparison of two approaches to dealing with missing observations not missing at random (conducting analysis on two datasets), transformation analysis of continuous variables for logistic regression, and logistic regression diagnostics.^ Regarding the psychometric goal, it was concluded that measurement model analysis was not possible with a relatively small dataset with nonnormal variables, such as Likert-scaled variables; therefore, exploratory factor analysis was used. The two approaches to dealing with missing values resulted in comparable findings. Transformation analysis suggested that the continuous variables were in the correct scale, and diagnostics that the model fit was adequate.^ The substantive portion of the analysis included the testing of four hypotheses. Hypothesis One proposed that attitudes/efficacy regarding alternative approaches to resolving grievances from the general questionnaire represented underlying factors: nonpunitive social norms and strategies for addressing grievances--using the political system, organizing protests, using the system to punish offenders, and personal mediation. Evidence was found to support all but one factor, nonpunitive social norms.^ Hypothesis Two proposed that the factor variables and the other independent variables--jail, grievance, male, young, and membership in a particular ethnic group--were associated with (non)violence. Jail, grievance, and not using the political system to address grievances were associated with a greater likelihood of intergroup violence.^ No evidence was found to support Hypotheses Three and Four, which proposed that grievance and ethnic group membership would interact with other variables (i.e., age, gender, etc.) to produce variant levels of subgroup (non)violence.^ The generalizability of the results of this study are constrained by the purposive self-selected nature of the sample and small sample size (n = 272).^ Suggestions for future research include incorporating other possible variables or factors predictive of intergroup violence in models of the kind tested here, and the development and evaluation of interventions that promote electoral and nonelectoral political participation as means of reducing interethnic conflict. ^
Resumo:
Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^
Resumo:
The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^
Resumo:
With most clinical trials, missing data presents a statistical problem in evaluating a treatment's efficacy. There are many methods commonly used to assess missing data; however, these methods leave room for bias to enter the study. This thesis was a secondary analysis on data taken from TIME, a phase 2 randomized clinical trial conducted to evaluate the safety and effect of the administration timing of bone marrow mononuclear cells (BMMNC) for subjects with acute myocardial infarction (AMI).^ We evaluated the effect of missing data by comparing the variance inflation factor (VIF) of the effect of therapy between all subjects and only subjects with complete data. Through the general linear model, an unbiased solution was made for the VIF of the treatment's efficacy using the weighted least squares method to incorporate missing data. Two groups were identified from the TIME data: 1) all subjects and 2) subjects with complete data (baseline and follow-up measurements). After the general solution was found for the VIF, it was migrated Excel 2010 to evaluate data from TIME. The resulting numerical value from the two groups was compared to assess the effect of missing data.^ The VIF values from the TIME study were considerably less in the group with missing data. By design, we varied the correlation factor in order to evaluate the VIFs of both groups. As the correlation factor increased, the VIF values increased at a faster rate in the group with only complete data. Furthermore, while varying the correlation factor, the number of subjects with missing data was also varied to see how missing data affects the VIF. When subjects with only baseline data was increased, we saw a significant rate increase in VIF values in the group with only complete data while the group with missing data saw a steady and consistent increase in the VIF. The same was seen when we varied the group with follow-up only data. This essentially showed that the VIFs steadily increased when missing data is not ignored. When missing data is ignored as with our comparison group, the VIF values sharply increase as correlation increases.^