3 resultados para random control
em DigitalCommons@The Texas Medical Center
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
Epidemiologic case-control studies of small groups of childhood nervous system tumor patients have suggested that parental employment in occupations with exposure to hydrocarbons is a risk factor for disease. The main focus of this case-control study was to assess the paternal occupation at the time of birth of offspring who later developed childhood intracranial and spinal tumors. All children under 15 years of age dying of such tumors in Texas, during the period 1964-1980, were selected as cases. Disease and demographic data were abstracted from death certificates. The birth certificate for each child of the final group of 499 cases was located and parental occupation information, as well as demographic and obstetric data, were collected. The comparison group consisted of a random sample from all Texas live births with the same birth year, race and sex distribution as the cases.^ The paternal occupations were categorized into broad classifications of those involving hydrocarbon exposure versus those that did not, based on the occupation criteria used in the previous studies. Odds ratios did not indicate any increased risk associated with general paternal hydrocarbon exposure in the workplace. In prior studies, increased risk estimates were detected with narrower groups of occupations involving exposure to hydrocarbon materials. The data from this study were classified according to these groups, and again, no increased risks were indicated except for a statistically insignificant but elevated odds ratio for fathers who were paper and pulp mill workers.^ Odds ratios were calculated for specific occupations and industries previously implicated as risk factors. Significantly associated odds ratios (OR) were detected for electricians (OR = 3.5), especially those working for construction companies (OR = 10.0), for employment in the printing occupations (OR = 4.5), particularly graphic arts workers (OR = 21.9), and in the electronics and electronic machinery industries (OR = 3.5). Analysis of the petroleum refining and chemical industries, which were not found in previous study populations, revealed significantly elevated odds ratios of 3.0 for occupations with probable heavy exposure to chemicals and petroleum compounds and 10.0 for salesmen of chemical products. ^
Resumo:
Introduction. Several studies have reported a positive association of body mass index (BMI) with multiple myeloma; however, the period of adulthood where BMI is most important remains unclear. In addition, it is well known that body fat is associated with both sex-steroid hormone storage and with increasing insulin levels; therefore, it was hypothesized that the association between obesity and multiple myeloma may be attributed to increased aromatization of androgen in adipose tissue. Objective. The overall objective of this case-control study was to determine whether multiple myeloma cases had higher BMI and greater adult weight gain relative to healthy controls. In addition, we tested the hypothesis that hormone replacement therapy use among women will further increase the association between BMI and risk of multiple myeloma. This study used data from a pilot case-control study at M.D. Anderson Cancer Center (MDACC), entitled Etiology of multiple myeloma, directed by Dr. Sara Strom and Dr. Sergio Giralt. Methods. The pilot study recruited a total of 122 cases of histopathologically confirmed multiple myeloma from MDACC. Controls (n=183) were selected from a database of random digit dialing controls accrued in the Department of Epidemiology at MDACC and were frequency matched to the cases on age (±5 years), gender, and race/ethnicity. Demographic and risk factor information were obtained from all participants who completed a self-administered questionnaire. Items included in the questionnaire include demographic information, height and weight at age 25, 40 and current/diagnosis, medical history, family history of cancer, smoking and alcohol use. Statistical analysis. Initial descriptive analysis included Student's t-test and Pearson's chi-squared tests. Odds ratios and 95% confidence intervals were calculated to quantify the association between the variables of interest and multiple myeloma. A multivariable model will be developed using unconditional logistic regression. Results. MM cases were 1.79 times (95% CI=0.99-3.32) more likely to have been overweight or obese (BMI > 25 kg/m2) at age 25 relative to healthy controls after controlling for age, gender, race/ethnicty, education and family history of cancer. Being overweight or obese at age 40 was not significantly associated with mutliple myeloma risk (OR=1.42, 95% CI=0.86-2.34) nor was being overweight or obses at diagnosis (OR=1.43, 95% CI=0.78, 2.63). We observed a statistically significant 2-fold increased odds of multiple myeloma in individuals who gained more than 4.7 kg during between 25 and 40 years (OR=1.97, 95% CI=1.15-3.39). When assessing HRT as a modifier of the BMI and multiple myeloma association among women (N=123), no association between obesity and MM status was observed among women who have never used HRT (OR=0.60, 95% CI=0.23-1.61; n=73). Yet among women who have ever used HRT (n=50), being overweight or obese was associated with an increase in MM risk (OR=2. 93, 95% CI=0.81-10.6) after adjusting for age; however, the association was not statistically significant. Significance. This study provides further evidence that increased BMI increases the risk of multiple myeloma. Furthermore, among women, HRT use may modify risk of disease. ^