2 resultados para Dimensionality
em DigitalCommons@The Texas Medical Center
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
Loneliness is a pervasive, rather common experience in American culture, particularly notable among adolescents. However, the phenomenon is not well documented in the cross-cultural psychiatric literature. For psychiatric epidemiology to encompass a wide array of psychopathologic phenomena, it is important to develop useful measures to characterize and classify both non-clinical and clinical dysfunction in diverse subgroups and cultures.^ The goal of this research was to examine the cross-cultural reliability and construct validity of a scale designed to measure loneliness. The Roberts Loneliness Scale (RLS-8) was administered to 4,060 adolescents ages 10-19 years enrolled in high schools along either side of the Texas-Tamaulipas border region between the U.S. and Mexico. Data collected in 1988 from a study focusing on substance use and psychological distress among adolescents in these regions were used to examine the operating characteristics of the RLS-8. A sample stratified by nationality and language, age, gender, and grade was used for analysis.^ Results indicated that in general the RLS-8 has moderate reliability in the U.S. sample, but not in the Mexican sample. Validity analyses demonstrated that there was evidence for convergent validity of the RLS-8 in the U.S. sample, but none in the Mexican sample. Discriminant validity of the measures in neither sample could be established. Based on the factor structure of the RLS-8, two subscales were created and analyzed for construct validity. Evidence for convergent validity was established for both subscales in both national samples. However, the discriminant validity of the measure remains unsubstantiated in both national samples. Also, the dimensionality of the scale is unresolved.^ One primary goal for future cross-cultural research would be to develop and test better defined culture-specific models of loneliness within the two cultures. From such scientific endeavor, measures of loneliness can be developed or reconstructed to classify the phenomenon in the same manner across cultures. Since estimates of prevalence and incidence are contingent upon reliable and valid screening or diagnostic measures, this objective would serve as an important foundation for future psychiatric epidemiologic inquiry into loneliness. ^