Biblioteca Digital

4 resultados para VARIABLE NEIGHBORHOOD RANDOM FIELDS

em DigitalCommons@The Texas Medical Center

An empirical evaluation of the Random Forests classifier models for variable selection in a large-scale lung cancer case-control study

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

Veja mais

Structural equation modeling of the medical outcome study: Short Form 36 (SF-36)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The factorial validity of the SF-36 was evaluated using confirmatory factor analysis (CFA) methods, structural equation modeling (SEM), and multigroup structural equation modeling (MSEM). First, the measurement and structural model of the hypothesized SF-36 was explicated. Second, the model was tested for the validity of a second-order factorial structure, upon evidence of model misfit, determined the best-fitting model, and tested the validity of the best-fitting model on a second random sample from the same population. Third, the best-fitting model was tested for invariance of the factorial structure across race, age, and educational subgroups using MSEM.^ The findings support the second-order factorial structure of the SF-36 as proposed by Ware and Sherbourne (1992). However, the results suggest that: (a) Mental Health and Physical Health covary; (b) general mental health cross-loads onto Physical Health; (c) general health perception loads onto Mental Health instead of Physical Health; (d) many of the error terms are correlated; and (e) the physical function scale is not reliable across these two samples. This hierarchical factor pattern was replicated across both samples of health care workers, suggesting that the post hoc model fitting was not data specific. Subgroup analysis suggests that the physical function scale is not reliable across the "age" or "education" subgroups and that the general mental health scale path from Mental Health is not reliable across the "white/nonwhite" or "education" subgroups.^ The importance of this study is in the use of SEM and MSEM in evaluating sample data from the use of the SF-36. These methods are uniquely suited to the analysis of latent variable structures and are widely used in other fields. The use of latent variable models for self reported outcome measures has become widespread, and should now be applied to medical outcomes research. Invariance testing is superior to mean scores or summary scores when evaluating differences between groups. From a practical, as well as, psychometric perspective, it seems imperative that construct validity research related to the SF-36 establish whether this same hierarchical structure and invariance holds for other populations.^ This project is presented as three articles to be submitted for publication. ^

Veja mais

An examination of the relationship between maternal and neighborhood-level characteristics and intrauterine growth retardation in Harris County, Texas, 1999--2001

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The persistence of low birth weight and intrauterine growth retardation (IUGR) in the United States has puzzled researchers for decades. Much of the work that has been conducted on adverse birth outcomes has focused on low birth weight in general and not on IUGR. Studies that have examined IUGR specifically thus far have focused primarily on individual-level maternal risk factors. These risk factors have only been able to explain a small portion of the variance in IUGR. Therefore, recent work has begun to focus on community-level risk factors in addition to the individual-level maternal characteristics. This study uses Social Ecology to examine the relationship of individual and community-level risk factors and IUGR. Logistic regression was used to establish an individual-level model based on 155, 856 births recorded in Harris County, TX during 1999-2001. IUGR was characterized using a fetal growth ratio method with race/ethnic and sex specific mean birth weights calculated from national vital records. The spatial distributions of 114,460 birth records spatially located within the City of Houston were examined using choropleth, probability and density maps. Census tracts with higher than expected rates of IUGR and high levels of neighborhood disadvantage were highlighted. Neighborhood disadvantage was constructed using socioeconomic variables from the 2000 U.S. Census. Factor analysis was used to create a unified single measure. Lastly, a random coefficients model was used to examine the relationship between varying levels of community disadvantage, given the set of individual-level risk factors for 152,997 birth records spatially located within Harris County, TX. Neighborhood disadvantage was measured using three different indices adapted from previous work. The findings show that pregnancy-induced hypertension, previous preterm infant, tobacco use and insufficient weight gain have the highest association with IUGR. Neighborhood disadvantage only slightly further increases the risk of IUGR (OR 1.12 to 1.23). Although community level disadvantage only helped to explain a small proportion of the variance of IUGR, it did have a significant impact. This finding suggests that community level risk factors should be included in future work with IUGR and that more work needs to be conducted. ^

Veja mais

Follow-up care after a health fair screening in three Houston neighborhoods in 2008

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background. Each year thousands of people participate in mass health screenings for diabetes and hypertension, but little is known about whether or not those who receive higher than normal screening results obtain the recommended follow-up medical care, or what barriers they perceive to doing so. ^ Methods. Study participants were recruited from attendees at three health fairs in low-income neighborhoods in Houston, Texas Potential participants had higher than normal blood pressure (> 90/140 mgHg) or blood glucose readings (100 mm/dL fasting or 140 mm/dL random). Study participants were called at one, two, and three months and asked if they had obtained follow-up medical care; those who had not yet obtained follow-up care were asked to identify barriers. Using a modified Aday-Andersen model of health service access, the independent variables were individual and community characteristics and self-perceived need. The dependent variable was obtaining follow-up care, with barriers to care a secondary outcome. ^ Results. Eighty-two study participants completed the initial questionnaire and 59 participants completed the study protocol. Forty-eight participants (59% under an intent to treat analysis, 81% of those completing the study protocol) obtained follow-up care. Those who completed the initial questionnaire and who reported a regular source of care were significantly more likely to obtain follow-up care. For those who completed the study protocol the relationship between having a regular source of care and obtaining follow-up care approached but did not reach significance. For those who completed the initial questionnaire, self-described health status, when examined as a binary variable (good, very good, excellent, or poor, fair, not sure) was associated with obtaining follow-up care for those who rated their health as poor, fair, or not sure. While the group who completed the study protocol did not reach statistical significance, the same relationship between self-described health status of poor, fair, or not sure and obtaining follow-up care was present. The participants who completed the study protocol and described their blood pressure as OK or a little high were statistically more likely to get follow-up care than those who described it as high or very high. All those on oral medications for hypertension (12/12) and diabetes (4/4) who were told to obtain follow-up care did so; however, the small sample size allows this correlation to be of statistical significance only for those treating hypertension. ^ The variables significantly associated with obtaining follow-up care were having a regular source of care, self-described health status of poor, fair, or not sure, self-described blood pressure of OK or a little high, and taking medication for blood pressure. ^ At the follow-up telephone calls, 34 participants identified barriers to care; cost was a significant barrier reported by 16 participants, and 10 reported that they didn’t have time because they were working long hours after Hurricane Ike. ^ The study included the offer of access assistance: information about nearby safety-net providers, a visit to or information from the Health Information Center at their Neighborhood Center location, or information from Project Safety Net (a searchable web site for safety net providers). Access assistance was offered at the health fairs and then again at follow-up telephone calls to those who had not yet obtained follow-up care. Of the 48 participants who reported obtaining follow-up care, 26 said they had made use of the access assistance to do so. The use of access assistance was associated with being Hispanic, not having health insurance or a regular source of care, and speaking Spanish. It was also associated with being worried about blood glucose. ^ Conclusion. Access assistance, as a community enabling characteristic, may be useful in aiding low-income people in obtaining medical care. ^

Veja mais

4 resultados para VARIABLE NEIGHBORHOOD RANDOM FIELDS

em DigitalCommons@The Texas Medical Center

Filtro por publicador

An empirical evaluation of the Random Forests classifier models for variable selection in a large-scale lung cancer case-control study

Structural equation modeling of the medical outcome study: Short Form 36 (SF-36)

An examination of the relationship between maternal and neighborhood-level characteristics and intrauterine growth retardation in Harris County, Texas, 1999--2001

Follow-up care after a health fair screening in three Houston neighborhoods in 2008