14 resultados para monotone missing data
em DigitalCommons@The Texas Medical Center
Resumo:
The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^
Resumo:
This study proposed a novel statistical method that modeled the multiple outcomes and missing data process jointly using item response theory. This method follows the "intent-to-treat" principle in clinical trials and accounts for the correlation between outcomes and missing data process. This method may provide a good solution to chronic mental disorder study. ^ The simulation study demonstrated that if the true model is the proposed model with moderate or strong correlation, ignoring the within correlation may lead to overestimate of the treatment effect and result in more type I error than specified level. Even if the within correlation is small, the performance of proposed model is as good as naïve response model. Thus, the proposed model is robust for different correlation settings if the data is generated by the proposed model.^
Resumo:
With most clinical trials, missing data presents a statistical problem in evaluating a treatment's efficacy. There are many methods commonly used to assess missing data; however, these methods leave room for bias to enter the study. This thesis was a secondary analysis on data taken from TIME, a phase 2 randomized clinical trial conducted to evaluate the safety and effect of the administration timing of bone marrow mononuclear cells (BMMNC) for subjects with acute myocardial infarction (AMI).^ We evaluated the effect of missing data by comparing the variance inflation factor (VIF) of the effect of therapy between all subjects and only subjects with complete data. Through the general linear model, an unbiased solution was made for the VIF of the treatment's efficacy using the weighted least squares method to incorporate missing data. Two groups were identified from the TIME data: 1) all subjects and 2) subjects with complete data (baseline and follow-up measurements). After the general solution was found for the VIF, it was migrated Excel 2010 to evaluate data from TIME. The resulting numerical value from the two groups was compared to assess the effect of missing data.^ The VIF values from the TIME study were considerably less in the group with missing data. By design, we varied the correlation factor in order to evaluate the VIFs of both groups. As the correlation factor increased, the VIF values increased at a faster rate in the group with only complete data. Furthermore, while varying the correlation factor, the number of subjects with missing data was also varied to see how missing data affects the VIF. When subjects with only baseline data was increased, we saw a significant rate increase in VIF values in the group with only complete data while the group with missing data saw a steady and consistent increase in the VIF. The same was seen when we varied the group with follow-up only data. This essentially showed that the VIFs steadily increased when missing data is not ignored. When missing data is ignored as with our comparison group, the VIF values sharply increase as correlation increases.^
Resumo:
Maximizing data quality may be especially difficult in trauma-related clinical research. Strategies are needed to improve data quality and assess the impact of data quality on clinical predictive models. This study had two objectives. The first was to compare missing data between two multi-center trauma transfusion studies: a retrospective study (RS) using medical chart data with minimal data quality review and the PRospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study with standardized quality assurance. The second objective was to assess the impact of missing data on clinical prediction algorithms by evaluating blood transfusion prediction models using PROMMTT data. RS (2005-06) and PROMMTT (2009-10) investigated trauma patients receiving ≥ 1 unit of red blood cells (RBC) from ten Level I trauma centers. Missing data were compared for 33 variables collected in both studies using mixed effects logistic regression (including random intercepts for study site). Massive transfusion (MT) patients received ≥ 10 RBC units within 24h of admission. Correct classification percentages for three MT prediction models were evaluated using complete case analysis and multiple imputation based on the multivariate normal distribution. A sensitivity analysis for missing data was conducted to estimate the upper and lower bounds of correct classification using assumptions about missing data under best and worst case scenarios. Most variables (17/33=52%) had <1% missing data in RS and PROMMTT. Of the remaining variables, 50% demonstrated less missingness in PROMMTT, 25% had less missingness in RS, and 25% were similar between studies. Missing percentages for MT prediction variables in PROMMTT ranged from 2.2% (heart rate) to 45% (respiratory rate). For variables missing >1%, study site was associated with missingness (all p≤0.021). Survival time predicted missingness for 50% of RS and 60% of PROMMTT variables. MT models complete case proportions ranged from 41% to 88%. Complete case analysis and multiple imputation demonstrated similar correct classification results. Sensitivity analysis upper-lower bound ranges for the three MT models were 59-63%, 36-46%, and 46-58%. Prospective collection of ten-fold more variables with data quality assurance reduced overall missing data. Study site and patient survival were associated with missingness, suggesting that data were not missing completely at random, and complete case analysis may lead to biased results. Evaluating clinical prediction model accuracy may be misleading in the presence of missing data, especially with many predictor variables. The proposed sensitivity analysis estimating correct classification under upper (best case scenario)/lower (worst case scenario) bounds may be more informative than multiple imputation, which provided results similar to complete case analysis.^
Resumo:
The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^
Resumo:
Objective: In this secondary data analysis, three statistical methodologies were implemented to handle cases with missing data in a motivational interviewing and feedback study. The aim was to evaluate the impact that these methodologies have on the data analysis. ^ Methods: We first evaluated whether the assumption of missing completely at random held for this study. We then proceeded to conduct a secondary data analysis using a mixed linear model to handle missing data with three methodologies (a) complete case analysis, (b) multiple imputation with explicit model containing outcome variables, time, and the interaction of time and treatment, and (c) multiple imputation with explicit model containing outcome variables, time, the interaction of time and treatment, and additional covariates (e.g., age, gender, smoke, years in school, marital status, housing, race/ethnicity, and if participants play on athletic team). Several comparisons were conducted including the following ones: 1) the motivation interviewing with feedback group (MIF) vs. the assessment only group (AO), the motivation interviewing group (MIO) vs. AO, and the intervention of the feedback only group (FBO) vs. AO, 2) MIF vs. FBO, and 3) MIF vs. MIO.^ Results: We first evaluated the patterns of missingness in this study, which indicated that about 13% of participants showed monotone missing patterns, and about 3.5% showed non-monotone missing patterns. Then we evaluated the assumption of missing completely at random by Little's missing completely at random (MCAR) test, in which the Chi-Square test statistic was 167.8 with 125 degrees of freedom, and its associated p-value was p=0.006, which indicated that the data could not be assumed to be missing completely at random. After that, we compared if the three different strategies reached the same results. For the comparison between MIF and AO as well as the comparison between MIF and FBO, only the multiple imputation with additional covariates by uncongenial and congenial models reached different results. For the comparison between MIF and MIO, all the methodologies for handling missing values obtained different results. ^ Discussions: The study indicated that, first, missingness was crucial in this study. Second, to understand the assumptions of the model was important since we could not identify if the data were missing at random or missing not at random. Therefore, future researches should focus on exploring more sensitivity analyses under missing not at random assumption.^
Resumo:
Most statistical analysis, theory and practice, is concerned with static models; models with a proposed set of parameters whose values are fixed across observational units. Static models implicitly assume that the quantified relationships remain the same across the design space of the data. While this is reasonable under many circumstances this can be a dangerous assumption when dealing with sequentially ordered data. The mere passage of time always brings fresh considerations and the interrelationships among parameters, or subsets of parameters, may need to be continually revised. ^ When data are gathered sequentially dynamic interim monitoring may be useful as new subject-specific parameters are introduced with each new observational unit. Sequential imputation via dynamic hierarchical models is an efficient strategy for handling missing data and analyzing longitudinal studies. Dynamic conditional independence models offers a flexible framework that exploits the Bayesian updating scheme for capturing the evolution of both the population and individual effects over time. While static models often describe aggregate information well they often do not reflect conflicts in the information at the individual level. Dynamic models prove advantageous over static models in capturing both individual and aggregate trends. Computations for such models can be carried out via the Gibbs sampler. An application using a small sample repeated measures normally distributed growth curve data is presented. ^
Resumo:
Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^
Resumo:
There has been a great deal of interest and debate recently concerning the linkages between inequality and health cross-nationally. Exposures to social and health inequalities likely vary as a consequence of different cultural contexts. It is important to guide research by a theoretical perspective that includes cultural and social contexts cross-nationally. If inequality affects health only under specific cultural conditions, this could explain why some of the literature that compares different societies finds no evidence of a relationship between inequality and health in certain countries. A theoretical framework is presented that combines sociological theory with constructs from cultural psychology in order to identify pathways that might lead from cultural dimensions to health inequalities. Three analyses are carried out. The first analysis explores whether there is a relationship between cultural dimensions at the societal level and self-rated health at the individual level. The findings suggest that different cultural norms at the societal level can produce both social and health inequalities, but the effects on health may differ depending on the socio-cultural context. The second analysis tests the hypothesis that health is affected by the density of social networks in a society, levels of societal trust, and inequality. The results suggest that commonly used measures of social cohesion and inequality may have both contextual and compositional effects on health in a large number of countries, and that societal measures of social cohesion and inequality interact with individual measures of social participation, trust, and income, moderating their effects on health. The third analysis explores whether value systems associated with vertical individualist societies may lead to health disparities because of their stigmatizing effects. I test the hypothesis that, within vertical individualist societies, subjective well-being will be affected by a social context where competition and the Protestant work ethic are valued, mediated by inequality. The hypothesis was not supported by the available cross-national data, most likely because of inadequate measures, missing data, and the small sample of vertical individualist countries. The overall findings demonstrate that cultural differences are important contextual factors that should not be overlooked when examining the causes of health inequalities. ^
Resumo:
Coronary artery disease (CAD) is the most common cause of morbidity and mortality in the United States. While Coronary Angiography (CA) is the gold standard test to investigate coronary artery disease, Prospective gated-64 Slice Computed Tomography (Prosp-64CT) is a new non-invasive technology that uses the 64Slice computed tomography (64CT) with electrocardiographic gating to investigate coronary artery disease. The aim of the current study was to investigate the role of Body Mass Index (BMI) as a factor affecting occurrence of CA after a Prosp-64CT, as well as the quality of the Prosp-64CT. Demographic and clinical characteristics of the study population were described. A secondary analysis of data on patients who underwent a Prosp-64CT for evaluation of coronary artery disease was performed. Seventy seven patients who underwent Prosp-64CT for evaluation for coronary artery disease were included. Fifteen patients were excluded because they had missing data regarding BMI, quality of the Prosp-64CT or CA. Thus, a total of 62 patients were included in the final analysis. The mean age was 56.2 years. The mean BMI was 31.3 kg/m 2. Eight (13%) patients underwent a CA within one month of Prosp-64CT. Eight (13%) patients had a poor quality Prosp-64CT. There was significant association of higher BMI as a factor for occurrence of CA post Prosp-64CT (P<0.05). There was a trend, but no statistical significance was observed for the association of being obese and occurrence of CA (P=0.06). BMI, as well as obesity, were not found to be significantly associated with poor quality of Prosp-64CT (P=0.19 and P=0.76, respectively). In conclusion, BMI was significantly associated with occurrence of CA within one month of Prosp-64CT. Thus, in patients with a higher BMI, diagnostic investigation with both tests could be avoided; rather, only a CA could be performed. However, the relationship of BMI to quality of Prosp-64CT needs to be further investigated since the sample size of the current study was small.^
Resumo:
There are two practical challenges in the phase I clinical trial conduct: lack of transparency to physicians, and the late onset toxicity. In my dissertation, Bayesian approaches are used to address these two problems in clinical trial designs. The proposed simple optimal designs cast the dose finding problem as a decision making process for dose escalation and deescalation. The proposed designs minimize the incorrect decision error rate to find the maximum tolerated dose (MTD). For the late onset toxicity problem, a Bayesian adaptive dose-finding design for drug combination is proposed. The dose-toxicity relationship is modeled using the Finney model. The unobserved delayed toxicity outcomes are treated as missing data and Bayesian data augment is employed to handle the resulting missing data. Extensive simulation studies have been conducted to examine the operating characteristics of the proposed designs and demonstrated the designs' good performances in various practical scenarios.^
Resumo:
Data from the 2009–2011 School Physical Activity and Nutrition (SPAN) project were analyzed to examine the association between bullied status at school during the past six months and engaging in five or more days of physical activity during the past seven days in a population of 8th and 11th grade Texas youths after stratifying by gender. As a secondary aim, this study also examined the association between weight status and the prevalence of bullied status at school. The final sample size for this study, after excluding missing data, consisted of 6,246 8th and 11th grade youths (girls, n= 3,237; boys, n=3,009) representing a total of 518,838 youths from 8th and 11th grade. Results from the multiple logistic regression adjusting for weight status, grade, and ethnicity, indicate that girls with a bullied status of at least two or three times per month had significantly lower odds of engaging in five or more days of physical activity during the past seven days than girls who were never bullied at school (ORadj=0.62; 95% CI, 0.40, 0.96). Conversely, girls who reported a bullied status of at least once per week were significantly more likely to engage in five or more days of physical activity during the past seven days compared to girls who were never bullied at school (ORadj=3.44; 95% CI, 1.56, 7.63). No significant associations between bullied status and engaging in five or more days of physical activity during the past seven days were found for boys. Bullied status differed significantly across weight status for 8th grade girls (χ2(6)=63.7, p<.05) and 11th grade boys (χ2(6) =94.93, p<.05), with overweight and obese youths reporting a higher prevalence of being bullied once or twice, at least two or three times per month, and at least once per week than their normal weight peers. Our finding that girls with bullied status of at least once per week were more likely to engage in five or more days of physical activity than girls who were never bullied warrants future qualitative research to identify potential explanations for such results. Future research on relational and weight-based bullying is also needed and may help explain the inconsistent findings between bullied status and engaging in physical activity in girls.^
Resumo:
Background: Once thought to be eradicated, pertussis is now making a steady comeback throughout Texas and the United States. Pertussis can have an effect on all demographics, but infants have the greatest health concern as they suffer the highest case-fatality rate. The objective of this study was to create and report a comprehensive summary of confirmed or probable pertussis cases in a Texas County during the 2008 through 2012 time period.^ Methods: A cross-sectional study design was used to show at risk populations in a Texas county using descriptive statistics of data from probable and confirmed pertussis cases in this Texas County from 2008-2012. Data was collected during routine pertussis investigations conducted by the local health department of this Texas County.^ Results: There was a sharp increase in pertussis cases seen in this county in 2012. Hispanics made up the majority of cases (74.9%) as compared to 12.8% of cases among Whites, 3.1% of cases among Blacks and 9.2% of cases among unknown/other. The population of Hispanics within this county was 58.9%. Almost a quarter of cases (24.2%) in this study were hospitalized. There was no difference identified in the proportion of male sources of exposure (48.9%) as compared to female (51.1%). Household contacts were the main sources of exposure: siblings (29.2%), fathers (14.5%), children (14.6%), and mothers (12.5%).^ Conclusion: Prevention intervention needs to be designed to target vulnerable populations and reduce the effect of this sometimes fatal disease. These results show pertussis proportionally has a greater effect on Hispanics. Additional research needs to be conducted on risk factors such as household crowding and immunization status among Hispanics to identify if ethnicity plays a role in risk of transmission of pertussis. The results were limited due to the large amount of missing data in vaccination history and identification of source of exposure.^
Commercial Sexual Exploitation and Missing Children in the Coastal Region of Sao Paulo State, Brazil
Resumo:
The commercial sexual exploitation of children (CSEC) has emerged as one of the world’s most heinous crimes. The problem affects millions of children worldwide and no country or community is fully immune from its effects. This paper reports first generation research of the relationship that exists between CSEC and the phenomenon of missing children living in and around the coastal regions of the state of Sao Paulo, Brazil, the country’s richest State. Data are reported from interviews and case records of 64 children and adolescents, who were receiving care through a major youth serving non-governmental organization (NGO) located in the coastal city of Sao Vicente. Also, data about missing children and adolescents were collected from Police Reports – a total of 858 Police Reports. In Brazil, prostitution is not a crime itself, however, the exploitation of prostitution is a crime. Therefore, the police have no information about children or adolescents in this situation, they only have information about the clients and exploiters. Thus, this investigation sought to accomplish two objectives: 1) to establish the relationship between missing and sexual exploited children; and 2) to sensitize police and child-serving authorities in both the governmental and nongovernmental sectors to the nature, extent, and seriousness of many unrecognized cases of CSEC and missing children that come to their attention. The observed results indicated that the missing children police report are significantly underestimated. They do not represent the number of children that run away and/or are involved in commercial sexual exploitation.