3 resultados para Most Productive Scale Size
em DigitalCommons@The Texas Medical Center
Resumo:
A population-based cross-sectional survey of socio-environmental factors associated with the prevalence of Dracunculus medinensis (guinea worm disease) was conducted in Idere, a rural agricultural community in Ibarapa, Oyo state, Nigeria, during 1982.^ The epidemiologic data were collected by household interview of all 501 households. The environmental data were collected by analysis of water samples collected from all domestic water sources and rainfall records.^ The specific objectives of this research were to: (a) Describe the prevalence of guinea worm disease in Idere during 1982 by age, sex, area of residence, drinking water source, religion and weekly amount of money spent by the household to collect potable drinking water. (b) Compare the characteristics of cases with non-cases of guinea worm in order to identify factors associated with high risk of infection. (c) Investigate domestic water sources for the distribution of Cyclops. (d) Determine the extent of potable water shortage with a view to identifying factors responsible for such shortage in the community. (e) Describe the effects of guinea worm on school attendance during 1980/1982 school years by class and location of school from piped water supply.^ The findings of this research indicate that during 1982, 31.8 percent of Idere's 6,527 residents experienced guinea worm infection, with higher prevalence of infection recorded in males in their most productive years and females in their teenage years. The role of sex and age to risk of higher infection rate was explained in the context of water related exposure and water intake due to dehydration from physical occupational actitives of subgroups.^ Potable water available to residents was considerably below the minimum recommended by WHO for tropical climates, with sixty-eight percent of water needs of the residents coming from unprotected surface water which harbour Cyclops, the obligatory intermediate host of Dracunculus medinensis. An association was found between periods of relative high density of Cyclops in domestic water and rainfall.^ Impact of guinea worm infection on educational activities was considerable and its implications were discussed, including the implications of the research findings in relation to control of guinea worm disease in Ibarapa. ^
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
Sizes and power of selected two-sample tests of the equality of survival distributions are compared by simulation for small samples from unequally, randomly-censored exponential distributions. The tests investigated include parametric tests (F, Score, Likelihood, Asymptotic), logrank tests (Mantel, Peto-Peto), and Wilcoxon-Type tests (Gehan, Prentice). Equal sized samples, n = 18, 16, 32 with 1000 (size) and 500 (power) simulation trials, are compared for 16 combinations of the censoring proportions 0%, 20%, 40%, and 60%. For n = 8 and 16, the Asymptotic, Peto-Peto, and Wilcoxon tests perform at nominal 5% size expectations, but the F, Score and Mantel tests exceeded 5% size confidence limits for 1/3 of the censoring combinations. For n = 32, all tests showed proper size, with the Peto-Peto test most conservative in the presence of unequal censoring. Powers of all tests are compared for exponential hazard ratios of 1.4 and 2.0. There is little difference in power characteristics of the tests within the classes of tests considered. The Mantel test showed 90% to 95% power efficiency relative to parametric tests. Wilcoxon-type tests have the lowest relative power but are robust to differential censoring patterns. A modified Peto-Peto test shows power comparable to the Mantel test. For n = 32, a specific Weibull-exponential comparison of crossing survival curves suggests that the relative powers of logrank and Wilcoxon-type tests are dependent on the scale parameter of the Weibull distribution. Wilcoxon-type tests appear more powerful than logrank tests in the case of late-crossing and less powerful for early-crossing survival curves. Guidelines for the appropriate selection of two-sample tests are given. ^