7 resultados para Optimum-path forests

em DigitalCommons@The Texas Medical Center


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As schools are pressured to perform on academics and standardized examinations, schools are reluctant to dedicate increased time to physical activity. After-school exercise and health programs may provide an opportunity to engage in more physical activity without taking time away from coursework during the day. The current study is a secondary data analysis of data from a randomized trial of a 10-week after-school program (six schools, n = 903) that implemented an exercise component based on the CATCH physical activity component and health modules based on the culturally-tailored Bienestar health education program. Outcome variables included BMI and aerobic capacity, health knowledge and healthy food intentions as assessed through path analysis techniques. Both the baseline model (χ2 (df = 8) = 16.90, p = .031; RMSEA = .035 (90% CI of .010–.058), NNFI = 0.983 and the CFI = 0.995) and the model incorporating intervention participation proved to be a good fit to the data (χ2 (df = 10) = 11.59, p = .314. RMSEA = .013 (90% CI of .010–.039); NNFI = 0.996 and CFI = 0.999). Experimental group participation was not predictive of changes in health knowledge, intentions to eat healthy foods or changes in Body Mass Index, but it was associated with increased aerobic capacity, β = .067, p < .05. School characteristics including SES and Language proficiency proved to be significantly associated with changes in knowledge and physical indicators. Further effects of school level variables on intervention outcomes are recommended so that tailored interventions can be developed aimed at the specific characteristics of each participating school. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Path analysis has been applied to components of the iron metabolic system with the intent of suggesting an integrated procedure for better evaluating iron nutritional status at the community level. The primary variables of interest in this study were (1) iron stores, (2) total iron-binding capacity, (3) serum ferritin, (4) serum iron, (5) transferrin saturation, and (6) hemoglobin concentration. Correlation coefficients for relationships among these variables were obtained from published literature and postulated in a series of models using measures of those variables that are feasible to include in a community nutritional survey. Models were built upon known information about the metabolism of iron and were limited by what had been reported in the literature in terms of correlation coefficients or quantitative relationships. Data were pooled from various studies and correlations of the same bivariate relationships were averaged after z- transformations. Correlation matrices were then constructed by transforming the average values back into correlation coefficients. The results of path analysis in this study indicate that hemoglobin is not a good indicator of early iron deficiency. It does not account for variance in iron stores. On the other hand, 91% of the variance in iron stores is explained by serum ferritin and total iron-binding capacity. In addition, the magnitude of the path coefficient (.78) of the serum ferritin-iron stores relationship signifies that serum ferritin is the most important predictor of iron stores in the proposed model. Finally, drawing upon known relations among variables and the amount of variance explained in path models, it is suggested that the following blood measures should be made in assessing community iron deficiency: (1) serum ferritin, (2) total iron-binding capacity, (3) serum iron, (4) transferrin saturation, and (5) hemoglobin concentration. These measures (with acceptable ranges and cut-off points) could make possible the complete evaluation of all three stages of iron deficiency in those persons surveyed at the community level. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation develops and tests through path analysis a theoretical model to explain how socioeconomic, socioenvironmental, and biologic risk factors simultaneously influence each other to further produce short-term, depressed growth in preschoolers. Three areas of risk factors were identified: child's proximal environment, maturational stage, and biological vulnerability. The theoretical model represented both the conceptual framework and the nature and direction of the hypotheses. Original research completed in 1978-80 and in 1982 provided the background data. It was analyzed first by nested-analysis of variance, followed by path analysis. The study provided evidence of mild iron deficiency and gastrointestinal symptomatology in the etiology of depressed, short-term weight gain. Also, there was evidence suggesting that family resources for material and social survival significantly contribute to the variability of short-term, age-adjusted growth velocity. These results challenge current views of unifocal intervention, whether for prevention or control. For policy formulations, though, the mechanisms underlying any set of interlaced relationships must be decoded. Theoretical formulations here proposed should be reassessed under a more extensive research design. It is suggested that studies should be undertaken where social changes are actually in progress; otherwise, nutritional epidemiology in developing countries operates somewhere between social reality and research concepts, with little grasp of its real potential. The study stresses that there is a connection between substantive theory, empirical observation, and policy issues. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the complex landscape of public education, participants at all levels are searching for policy and practice levers that can raise overall performance and close achievement gaps. The collection of articles in this edition of the Journal of Applied Research on Children takes a big step toward providing the tools and tactics needed for an evidence-based approach to educational policy and practice.