11 resultados para Logistic models
em DigitalCommons@The Texas Medical Center
Resumo:
The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^
Resumo:
This study investigates the degree to which gender, ethnicity, relationship to perpetrator, and geomapped socio-economic factors significantly predict the incidence of childhood sexual abuse, physical abuse and non- abuse. These variables are then linked to geographic identifiers using geographic information system (GIS) technology to develop a geo-mapping framework for child sexual and physical abuse prevention.
Resumo:
BACKGROUND: Variants in the complement cascade genes and the LOC387715/HTRA1, have been widely reported to associate with age-related macular degeneration (AMD), the most common cause of visual impairment in industrialized countries. METHODS/PRINCIPAL FINDINGS: We investigated the association between the LOC387715 A69S and complement component C3 R102G risk alleles in the Finnish case-control material and found a significant association with both variants (OR 2.98, p = 3.75 x 10(-9); non-AMD controls and OR 2.79, p = 2.78 x 10(-19), blood donor controls and OR 1.83, p = 0.008; non-AMD controls and OR 1.39, p = 0.039; blood donor controls), respectively. Previously, we have shown a strong association between complement factor H (CFH) Y402H and AMD in the Finnish population. A carrier of at least one risk allele in each of the three susceptibility loci (LOC387715, C3, CFH) had an 18-fold risk of AMD when compared to a non-carrier homozygote in all three loci. A tentative gene-gene interaction between the two major AMD-associated loci, LOC387715 and CFH, was found in this study using a multiplicative (logistic regression) model, a synergy index (departure-from-additivity model) and the mutual information method (MI), suggesting that a common causative pathway may exist for these genes. Smoking (ever vs. never) exerted an extra risk for AMD, but somewhat surprisingly, only in connection with other factors such as sex and the C3 genotype. Population attributable risks (PAR) for the CFH, LOC387715 and C3 variants were 58.2%, 51.4% and 5.8%, respectively, the summary PAR for the three variants being 65.4%. CONCLUSIONS/SIGNIFICANCE: Evidence for gene-gene interaction between two major AMD associated loci CFH and LOC387715 was obtained using three methods, logistic regression, a synergy index and the mutual information (MI) index.
Resumo:
BACKGROUND: Renal failure after thoracoabdominal aortic repair is a significant clinical problem. Distal aortic perfusion for organ and spinal cord protection requires cannulation of the left femoral artery. In 2006, we reported the finding that direct cannulation led to leg ischemia in some patients and was associated with increased renal failure. After this finding, we modified our perfusion technique to eliminate leg ischemia from cannulation. In this article, we present the effects of this change on postoperative renal function. METHODS: Between February 1991 and July 2008, we repaired 1464 thoracoabdominal aortic aneurysms. Distal aortic perfusion was used in 1088, and these were studied. Median patient age was 68 years, and 378 (35%) were women. In September 2006, we began to adopt a sidearm femoral cannulation technique that provides distal aortic perfusion while maintaining downstream flow to the leg. This was used in 167 patients (15%). We measured the joint effects of preoperative glomerular filtration rate (GFR) and cannulation technique on the highest postoperative creatinine level, postoperative renal failure, and death. Analysis was by multiple linear or logistic regression with interaction. RESULTS: The preoperative GFR was the strongest predictor of postoperative renal dysfunction and death. No significant main effects of sidearm cannulation were noted. For peak creatinine level and postoperative renal failure, however, strong interactions between preoperative GFR and sidearm cannulation were present, resulting in reductions of postoperative renal complications of 15% to 20% when GFR was <60 mL>/min/1.73 m(2). For normal GFR, the effect was negated or even reversed at very high levels of GFR. Mortality, although not significantly affected by sidearm cannulation, showed a similar trend to the renal outcomes. CONCLUSION: Use of sidearm cannulation is associated with a clinically important and highly statistically significant reduction in postoperative renal complications in patients with a low GFR. Reduced renal effect of skeletal muscle ischemia is the proposed mechanism. Effects among patients with good preoperative renal function are less clear. A randomized trial is needed.
Resumo:
BACKGROUND: Decisions regarding whether to administer intensive care to extremely premature infants are often based on gestational age alone. However, other factors also affect the prognosis for these patients. METHODS: We prospectively studied a cohort of 4446 infants born at 22 to 25 weeks' gestation (determined on the basis of the best obstetrical estimate) in the Neonatal Research Network of the National Institute of Child Health and Human Development to relate risk factors assessable at or before birth to the likelihood of survival, survival without profound neurodevelopmental impairment, and survival without neurodevelopmental impairment at a corrected age of 18 to 22 months. RESULTS: Among study infants, 3702 (83%) received intensive care in the form of mechanical ventilation. Among the 4192 study infants (94%) for whom outcomes were determined at 18 to 22 months, 49% died, 61% died or had profound impairment, and 73% died or had impairment. In multivariable analyses of infants who received intensive care, exposure to antenatal corticosteroids, female sex, singleton birth, and higher birth weight (per each 100-g increment) were each associated with reductions in the risk of death and the risk of death or profound or any neurodevelopmental impairment; these reductions were similar to those associated with a 1-week increase in gestational age. At the same estimated likelihood of a favorable outcome, girls were less likely than boys to receive intensive care. The outcomes for infants who underwent ventilation were better predicted with the use of the above factors than with use of gestational age alone. CONCLUSIONS: The likelihood of a favorable outcome with intensive care can be better estimated by consideration of four factors in addition to gestational age: sex, exposure or nonexposure to antenatal corticosteroids, whether single or multiple birth, and birth weight. (ClinicalTrials.gov numbers, NCT00063063 [ClinicalTrials.gov] and NCT00009633 [ClinicalTrials.gov].).
Resumo:
This cross-sectional analysis of the data from the Third National Health and Nutrition Examination Survey was conducted to determine the prevalence and determinants of asthma and wheezing among US adults, and to identify the occupations and industries at high risk of developing work-related asthma and work-related wheezing. Separate logistic models were developed for physician-diagnosed asthma (MD asthma), wheezing in the previous 12 months (wheezing), work-related asthma and work-related wheezing. Major risk factors including demographic, socioeconomic, indoor air quality, allergy, and other characteristics were analyzed. The prevalence of lifetime MD asthma was 7.7% and the prevalence of wheezing was 17.2%. Mexican-Americans exhibited the lowest prevalence of MD asthma (4.8%; 95% confidence interval (CI): 4.2, 5.4) when compared to other race-ethnic groups. The prevalence of MD asthma or wheezing did not vary by gender. Multiple logistic regression analysis showed that Mexican-Americans were less likely to develop MD asthma (adjusted odds ratio (ORa) = 0.64, 95%CI: 0.45, 0.90) and wheezing (ORa = 0.55, 95%CI: 0.44, 0.69) when compared to non-Hispanic whites. Low education level, current and past smoking status, pet ownership, lifetime diagnosis of physician-diagnosed hay fever and obesity were all significantly associated with MD asthma and wheezing. No significant effect of indoor air pollutants on asthma and wheezing was observed in this study. The prevalence of work-related asthma was 3.70% (95%CI: 2.88, 4.52) and the prevalence of work-related wheezing was 11.46% (95%CI: 9.87, 13.05). The major occupations identified at risk of developing work-related asthma and wheezing were cleaners; farm and agriculture related occupations; entertainment related occupations; protective service occupations; construction; mechanics and repairers; textile; fabricators and assemblers; other transportation and material moving occupations; freight, stock and material movers; motor vehicle operators; and equipment cleaners. The population attributable risk for work-related asthma and wheeze were 26% and 27% respectively. The major industries identified at risk of work-related asthma and wheeze include entertainment related industry; agriculture, forestry and fishing; construction; electrical machinery; repair services; and lodging places. The population attributable risk for work-related asthma was 36.5% and work-related wheezing was 28.5% for industries. Asthma remains an important public health issue in the US and in the other regions of the world. ^
Resumo:
The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^
Resumo:
In 2011, there will be an estimated 1,596,670 new cancer cases and 571,950 cancer-related deaths in the US. With the ever-increasing applications of cancer genetics in epidemiology, there is great potential to identify genetic risk factors that would help identify individuals with increased genetic susceptibility to cancer, which could be used to develop interventions or targeted therapies that could hopefully reduce cancer risk and mortality. In this dissertation, I propose to develop a new statistical method to evaluate the role of haplotypes in cancer susceptibility and development. This model will be flexible enough to handle not only haplotypes of any size, but also a variety of covariates. I will then apply this method to three cancer-related data sets (Hodgkin Disease, Glioma, and Lung Cancer). I hypothesize that there is substantial improvement in the estimation of association between haplotypes and disease, with the use of a Bayesian mathematical method to infer haplotypes that uses prior information from known genetics sources. Analysis based on haplotypes using information from publically available genetic sources generally show increased odds ratios and smaller p-values in both the Hodgkin, Glioma, and Lung data sets. For instance, the Bayesian Joint Logistic Model (BJLM) inferred haplotype TC had a substantially higher estimated effect size (OR=12.16, 95% CI = 2.47-90.1 vs. 9.24, 95% CI = 1.81-47.2) and more significant p-value (0.00044 vs. 0.008) for Hodgkin Disease compared to a traditional logistic regression approach. Also, the effect sizes of haplotypes modeled with recessive genetic effects were higher (and had more significant p-values) when analyzed with the BJLM. Full genetic models with haplotype information developed with the BJLM resulted in significantly higher discriminatory power and a significantly higher Net Reclassification Index compared to those developed with haplo.stats for lung cancer. Future analysis for this work could be to incorporate the 1000 Genomes project, which offers a larger selection of SNPs can be incorporated into the information from known genetic sources as well. Other future analysis include testing non-binary outcomes, like the levels of biomarkers that are present in lung cancer (NNK), and extending this analysis to full GWAS studies.
Resumo:
This paper reports a comparison of three modeling strategies for the analysis of hospital mortality in a sample of general medicine inpatients in a Department of Veterans Affairs medical center. Logistic regression, a Markov chain model, and longitudinal logistic regression were evaluated on predictive performance as measured by the c-index and on accuracy of expected numbers of deaths compared to observed. The logistic regression used patient information collected at admission; the Markov model was comprised of two absorbing states for discharge and death and three transient states reflecting increasing severity of illness as measured by laboratory data collected during the hospital stay; longitudinal regression employed Generalized Estimating Equations (GEE) to model covariance structure for the repeated binary outcome. Results showed that the logistic regression predicted hospital mortality as well as the alternative methods but was limited in scope of application. The Markov chain provides insights into how day to day changes of illness severity lead to discharge or death. The longitudinal logistic regression showed that increasing illness trajectory is associated with hospital mortality. The conclusion is reached that for standard applications in modeling hospital mortality, logistic regression is adequate, but for new challenges facing health services research today, alternative methods are equally predictive, practical, and can provide new insights. ^
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
My dissertation focuses on developing methods for gene-gene/environment interactions and imprinting effect detections for human complex diseases and quantitative traits. It includes three sections: (1) generalizing the Natural and Orthogonal interaction (NOIA) model for the coding technique originally developed for gene-gene (GxG) interaction and also to reduced models; (2) developing a novel statistical approach that allows for modeling gene-environment (GxE) interactions influencing disease risk, and (3) developing a statistical approach for modeling genetic variants displaying parent-of-origin effects (POEs), such as imprinting. In the past decade, genetic researchers have identified a large number of causal variants for human genetic diseases and traits by single-locus analysis, and interaction has now become a hot topic in the effort to search for the complex network between multiple genes or environmental exposures contributing to the outcome. Epistasis, also known as gene-gene interaction is the departure from additive genetic effects from several genes to a trait, which means that the same alleles of one gene could display different genetic effects under different genetic backgrounds. In this study, we propose to implement the NOIA model for association studies along with interaction for human complex traits and diseases. We compare the performance of the new statistical models we developed and the usual functional model by both simulation study and real data analysis. Both simulation and real data analysis revealed higher power of the NOIA GxG interaction model for detecting both main genetic effects and interaction effects. Through application on a melanoma dataset, we confirmed the previously identified significant regions for melanoma risk at 15q13.1, 16q24.3 and 9p21.3. We also identified potential interactions with these significant regions that contribute to melanoma risk. Based on the NOIA model, we developed a novel statistical approach that allows us to model effects from a genetic factor and binary environmental exposure that are jointly influencing disease risk. Both simulation and real data analyses revealed higher power of the NOIA model for detecting both main genetic effects and interaction effects for both quantitative and binary traits. We also found that estimates of the parameters from logistic regression for binary traits are no longer statistically uncorrelated under the alternative model when there is an association. Applying our novel approach to a lung cancer dataset, we confirmed four SNPs in 5p15 and 15q25 region to be significantly associated with lung cancer risk in Caucasians population: rs2736100, rs402710, rs16969968 and rs8034191. We also validated that rs16969968 and rs8034191 in 15q25 region are significantly interacting with smoking in Caucasian population. Our approach identified the potential interactions of SNP rs2256543 in 6p21 with smoking on contributing to lung cancer risk. Genetic imprinting is the most well-known cause for parent-of-origin effect (POE) whereby a gene is differentially expressed depending on the parental origin of the same alleles. Genetic imprinting affects several human disorders, including diabetes, breast cancer, alcoholism, and obesity. This phenomenon has been shown to be important for normal embryonic development in mammals. Traditional association approaches ignore this important genetic phenomenon. In this study, we propose a NOIA framework for a single locus association study that estimates both main allelic effects and POEs. We develop statistical (Stat-POE) and functional (Func-POE) models, and demonstrate conditions for orthogonality of the Stat-POE model. We conducted simulations for both quantitative and qualitative traits to evaluate the performance of the statistical and functional models with different levels of POEs. Our results showed that the newly proposed Stat-POE model, which ensures orthogonality of variance components if Hardy-Weinberg Equilibrium (HWE) or equal minor and major allele frequencies is satisfied, had greater power for detecting the main allelic additive effect than a Func-POE model, which codes according to allelic substitutions, for both quantitative and qualitative traits. The power for detecting the POE was the same for the Stat-POE and Func-POE models under HWE for quantitative traits.