20 resultados para Latent variable models
em DigitalCommons@The Texas Medical Center
Resumo:
Studies on the relationship between psychosocial determinants and HIV risk behaviors have produced little evidence to support hypotheses based on theoretical relationships. One limitation inherent in many articles in the literature is the method of measurement of the determinants and the analytic approach selected. ^ To reduce the misclassification associated with unit scaling of measures specific to internalized homonegativity, I evaluated the psychometric properties of the Reactions to Homosexuality scale in a confirmatory factor analytic framework. In addition, I assessed the measurement invariance of the scale across racial/ethnic classifications in a sample of men who have sex with men. The resulting measure contained eight items loading on three first-order factors. Invariance assessment identified metric and partial strong invariance between racial/ethnic groups in the sample. ^ Application of the updated measure to a structural model allowed for the exploration of direct and indirect effects of internalized homonegativity on unprotected anal intercourse. Pathways identified in the model show that drug and alcohol use at last sexual encounter, the number of sexual partners in the previous three months and sexual compulsivity all contribute directly to risk behavior. Internalized homonegativity reduced the likelihood of exposure to drugs, alcohol or higher numbers of partners. For men who developed compulsive sexual behavior as a coping strategy for internalized homonegativity, there was an increase in the prevalence odds of risk behavior. ^ In the final stage of the analysis, I conducted a latent profile analysis of the items in the updated Reactions to Homosexuality scale. This analysis identified five distinct profiles, which suggested that the construct was not homogeneous in samples of men who have sex with men. Lack of prior consideration of these distinct manifestations of internalized homonegativity may have contributed to the analytic difficulty in identifying a relationship between the trait and high-risk sexual practices. ^
Resumo:
The factorial validity of the SF-36 was evaluated using confirmatory factor analysis (CFA) methods, structural equation modeling (SEM), and multigroup structural equation modeling (MSEM). First, the measurement and structural model of the hypothesized SF-36 was explicated. Second, the model was tested for the validity of a second-order factorial structure, upon evidence of model misfit, determined the best-fitting model, and tested the validity of the best-fitting model on a second random sample from the same population. Third, the best-fitting model was tested for invariance of the factorial structure across race, age, and educational subgroups using MSEM.^ The findings support the second-order factorial structure of the SF-36 as proposed by Ware and Sherbourne (1992). However, the results suggest that: (a) Mental Health and Physical Health covary; (b) general mental health cross-loads onto Physical Health; (c) general health perception loads onto Mental Health instead of Physical Health; (d) many of the error terms are correlated; and (e) the physical function scale is not reliable across these two samples. This hierarchical factor pattern was replicated across both samples of health care workers, suggesting that the post hoc model fitting was not data specific. Subgroup analysis suggests that the physical function scale is not reliable across the "age" or "education" subgroups and that the general mental health scale path from Mental Health is not reliable across the "white/nonwhite" or "education" subgroups.^ The importance of this study is in the use of SEM and MSEM in evaluating sample data from the use of the SF-36. These methods are uniquely suited to the analysis of latent variable structures and are widely used in other fields. The use of latent variable models for self reported outcome measures has become widespread, and should now be applied to medical outcomes research. Invariance testing is superior to mean scores or summary scores when evaluating differences between groups. From a practical, as well as, psychometric perspective, it seems imperative that construct validity research related to the SF-36 establish whether this same hierarchical structure and invariance holds for other populations.^ This project is presented as three articles to be submitted for publication. ^
Resumo:
The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
Research on lifestyle physical activity interventions suggests that they help individuals meet the new recommendations for physical activity made by the Centers for Disease Control and Prevention (CDC) and the American College of Sports Medicine (ACSM). The purpose of this research was to describe the rates of adherence to two lifestyle physical activity intervention arms and to examine the association between adherence and outcome variables, using data from Project PRIME, a lifestyle physical activity intervention based on the transtheoretical model and conducted by the Cooper Institute of Aerobics Research, Dallas, Texas. Participants were 250 sedentary healthy adults, aged 35 to 70 years, primarily non-Hispanic White, and in the contemplation and preparation stages of readiness to change. They were randomized to a group (PRIME G) or a mail- and telephone-delivered condition (PRIME C). Adherence measures included attending class (PRIME G), completing a monthly telephone call with a health educator (PRIME C), and completing homework assignments and self-monitoring minutes of moderate- to vigorous physical activity (both groups). In the first results paper, adherence over time and between conditions was examined: Attendance in group, completing the monthly telephone call, and homework completion decreased over time, and participants in PRIME G were more likely to complete homework than those in PRIME C. Paper 2 aimed to determine whether the adherence measures predicted achievement of the CDC/ACSM physical activity guideline. In separate models for the two conditions, a latent variable measuring adherence was found to predict achievement of the guideline. Paper 3 examined the association between adherence measures and the transtheoretical model's processes of change within each condition. For both, participants who completed at least two thirds of the homework assignments improved their use of the processes of change more than those who completed less than that amount. These results suggest that encouraging adherence to a lifestyle physical activity intervention, at least among already motivated volunteers, may increase the likelihood of beneficial changes in the outcomes. ^
Resumo:
It is widely acknowledged in theoretical and empirical literature that social relationships, comprising of structural measures (social networks) and functional measures (perceived social support) have an undeniable effect on health outcomes. However, the actual mechanism of this effect has yet to be clearly understood or explicated. In addition, comorbidity is found to adversely affect social relationships and health related quality of life (a valued outcome measure in cancer patients and survivors). ^ This cross sectional study uses selected baseline data (N=3088) from the Women's Healthy Eating and Living (WHEL) study. Lisrel 8.72 was used for the latent variable structural equation modeling. Due to the ordinal nature of the data, Weighted Least Squares (WLS) method of estimation using Asymptotic Distribution Free covariance matrices was chosen for this analysis. The primary exogenous predictor variables are Social Networks and Comorbidity; Perceived Social Support is the endogenous predictor variable. Three dimensions of HRQoL, physical, mental and satisfaction with current quality of life were the outcome variables. ^ This study hypothesizes and tests the mechanism and pathways between comorbidity, social relationships and HRQoL using latent variable structural equation modeling. After testing the measurement models of social networks and perceived social support, a structural model hypothesizing associations between the latent exogenous and endogenous variables was tested. The results of the study after listwise deletion (N=2131) mostly confirmed the hypothesized relationships (TLI, CFI >0.95, RMSEA = 0.05, p=0.15). Comorbidity was adversely associated with all three HRQoL outcomes. Strong ties were negatively associated with perceived social support; social network had a strong positive association with perceived social support, which served as a mediator between social networks and HRQoL. Mental health quality of life was the most adversely affected by the predictor variables. ^ This study is a preliminary look at the integration of structural and functional measures of social relationships, comorbidity and three HRQoL indicators using LVSEM. Developing stronger social networks and forming supportive relationships is beneficial for health outcomes such as HRQoL of cancer survivors. Thus, the medical community treating cancer survivors as well as the survivor's social networks need to be informed and cognizant of these possible relationships. ^
Resumo:
Few studies have investigated causal pathways linking psychosocial factors to each other and to screening mammography. Conflicting hypotheses exist in the theoretic literature regarding the role and importance of subjective norms, a person's perceived social pressure to perform the behavior and his/her motivation to comply. The Theory of Reasoned Action (TRA) hypothesizes that subjective norms directly affect intention; while the Transtheoretical Model (TTM) hypothesizes that attitudes mediate the influence of subjective norms on stage of change. No one has examined which hypothesis best predicts the effect of subjective norms on mammography intention and stage of change. Two statistical methods are available for testing mediation, sequential regression analysis (SRA) and latent variable structural equation modeling (LVSEM); however, software to apply LVSEM to dichotomous variables like intention has only recently become available. No one has compared the methods to determine whether or not they yield similar results for dichotomous variables. ^ Study objectives were to: (1) determine whether the effect of subjective norms on mammography intention and stage of change are mediated by pros and cons; and (2) compare mediation results from the SRA and LVSEM approaches when the outcome is dichotomous. We conducted a secondary analysis of data from a national sample of women veterans enrolled in Project H.O.M.E. (H&barbelow;ealthy O&barbelow;utlook on the M&barbelow;ammography E&barbelow;xperience), a behavioral intervention trial. ^ Results showed that the TTM model described the causal pathways better than the TRA one; however, we found support for only one of the TTM causal mechanisms. Cons was the sole mediator. The mediated effect of subjective norms on intention and stage of change by cons was very small. These findings suggest that interventionists focus their efforts on reducing negative attitudes toward mammography when resources are limited. ^ Both the SRA and LVSEM methods provided evidence for complete mediation, and the direction, magnitude, and standard errors of the parameter estimates were very similar. Because SRA parameter estimates were not biased toward the null, we can probably assume negligible measurement error in the independent and mediator variables. Simulation studies are needed to further our understanding of how these two methods perform under different data conditions. ^
Resumo:
Background. Literature worldwide has documented associations between gender-based relationship inequity, sexual communication self-efficacy, and actual use of condoms and contraceptives among young women. However studies that have rigorously tested these associations in southern Vietnam are extremely rare. This study aimed to examine these associations and other current sexual practices among undergraduate female students in the Mekong Delta. Method. A qualitative study was conducted to examine the operationalization of the Theory of Gender and Power and to obtain salient and culture-relevant dimensions of perceived gender relations in the Mekong Delta of Vietnam. Sixty-four undergraduate female students from two universities participated in eight group discussions focusing on their viewpoints regarding national and local gender equity issues. A subsequent cross-sectional survey consisting of 1181 third-year female students from Can Tho University and An Giang University was conducted. Latent variable modeling and logistic regression were employed to examine the hypothesized associations. Results. Dimensions of perceived gender relations were attributable to theoretical structures of labor, power, and cathexis. Perceptions about gender inequities were comparable to findings from several reports, in which women were still viewed as inferior and subordinate to men. Among students who had ever had a boyfriend(s) (72.4%), 44.8% indicated that their boyfriend had ever asked for sex, 13% had ever had penile-vaginal sex, and 10.3% had ever had oral sex. For those who had ever had penile-vaginal sex, 33% did not use any contraceptive method at first sex. The greater a student’s perception that women were subordinate to men, the lower her self-efficacy for sexual communication and the lower her actual frequencies of asking for contraceptive or condom use. Sexual communication self-efficacy was marginally associated with actual contraceptive use (p=.039) and condom use (p=.092) at first sex. Conclusion. Sexual health promotion strategies should address the influence of perceived unequal gender relations on young women’s sexual communication self-efficacy and the subsequent impact on actual contraceptive and condom use.^
Resumo:
The purpose of this research was to better understand the impact of the terrorist attacks in 2001 on public health, particularly for Texas public health. This study employed mixed methods to examine changes to public health culture within Texas local public health agencies, important attitudes of public health workers toward responding to a disaster, and the funding policies that might ensure our investment in public health emergency preparedness is protected. ^ A qualitative analysis of interviews conducted with a large sample of public health officials in Texas found that all the constituent parts of a peculiar culture for public health preparedness existed that spanned the state's local health departments regardless of size, or funding level. The new preparedness culture in Texas had the hallmarks necessary for a robust public health preparedness and emergency response system. ^ The willingness of public health workers, necessary to make these kinds of changes and mount a disaster response was examined in one of Texas' most experienced disaster response teams—the public health workers for the City of Houston. A hypothesized latent variable model showed that willingness mediated all other factors in the model (self-efficacy, knowledge, barriers, and risk perception) for self-reported likelihood of reporting to work for a disaster. The RMSEA for the final model was 0.042 with a confidence interval of 0.036—0.049 and the chi-squared difference test was P=0.08, indicating a well-fitted model that suggests willingness is an important factor for consideration by preparedness planners and researchers alike. ^ Finally, with disasters on the rise and federal funding for preparedness dwindling, a review of states' policies for the distribution of these funds and their advantages and disadvantages were examined through a review of current literature and public documents, and a survey of state-level public health officials, emergency management professionals and researchers. Although the base plus per-capita method is the most common, it is not necessarily perceived to be the most effective. No clear "optimal" method emerged from the study, but recommendations for a strategic combination of three methods were made that has the potential to maximize the benefits of each method, while minimizing the weaknesses.^
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^
Resumo:
The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^
Resumo:
Lung cancer is a devastating disease with very poor prognosis. The design of better treatments for patients would be greatly aided by mouse models that closely resemble the human disease. The most common type of human lung cancer is adenocarcinoma with frequent metastasis. Unfortunately, current models for this tumor are inadequate due to the absence of metastasis. Based on the molecular findings in human lung cancer and metastatic potential of osteosarcomas in mutant p53 mouse models, I hypothesized that mice with both K-ras and p53 missense mutations might develop metastatic lung adenocarcinomas. Therefore, I incorporated both K-rasLA1 and p53RI72HΔg alleles into mouse lung cells to establish a more faithful model for human lung adenocarcinoma and for translational and mechanistic studies. Mice with both mutations ( K-rasLA1/+ p53R172HΔg/+) developed advanced lung adenocarcinomas with similar histopathology to human tumors. These lung adenocarcinomas were highly aggressive and metastasized to multiple intrathoracic and extrathoracic sites in a pattern similar to that seen in lung cancer patients. This mouse model also showed gender differences in cancer related death and developed pleural mesotheliomas in 23.2% of them. In a preclinical study, the new drug Erlotinib (Tarceva) decreased the number and size of lung lesions in this model. These data demonstrate that this mouse model most closely mimics human metastatic lung adenocarcinoma and provides an invaluable system for translational studies. ^ To screen for important genes for metastasis, gene expression profiles of primary lung adenocarcinomas and metastases were analyzed. Microarray data showed that these two groups were segregated in gene expression and had 79 highly differentially expressed genes (more than 2.5 fold changes and p<0.001). Microarray data of Bub1b, Vimentin and CCAM1 were validated in tumors by quantitative real-time PCR (QPCR). Bub1b , a mitotic checkpoint gene, was overexpressed in metastases and this correlated with more chromosomal abnormalities in metastatic cells. Vimentin, a marker of epithelial-mesenchymal transition (EMT), was also highly expressed in metastases. Interestingly, Twist, a key EMT inducer, was also highly upregulated in metastases by QPCR, and this significantly correlated with the overexpression of Vimentin in the same tumors. These data suggest EMT occurs in lung adenocarcinomas and is a key mechanism for the development of metastasis in K-ras LA1/+ p53R172HΔg/+ mice. Thus, this mouse model provides a unique system to further probe the molecular basis of metastatic lung cancer.^
Resumo:
The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^
Resumo:
In regression analysis, covariate measurement error occurs in many applications. The error-prone covariates are often referred to as latent variables. In this proposed study, we extended the study of Chan et al. (2008) on recovering latent slope in a simple regression model to that in a multiple regression model. We presented an approach that applied the Monte Carlo method in the Bayesian framework to the parametric regression model with the measurement error in an explanatory variable. The proposed estimator applied the conditional expectation of latent slope given the observed outcome and surrogate variables in the multiple regression models. A simulation study was presented showing that the method produces estimator that is efficient in the multiple regression model, especially when the measurement error variance of surrogate variable is large.^