12 resultados para Multinomial Logistic Regression

em DigitalCommons@The Texas Medical Center


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study investigates the degree to which gender, ethnicity, relationship to perpetrator, and geomapped socio-economic factors significantly predict the incidence of childhood sexual abuse, physical abuse and non- abuse. These variables are then linked to geographic identifiers using geographic information system (GIS) technology to develop a geo-mapping framework for child sexual and physical abuse prevention.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The history of the logistic function since its introduction in 1838 is reviewed, and the logistic model for a polychotomous response variable is presented with a discussion of the assumptions involved in its derivation and use. Following this, the maximum likelihood estimators for the model parameters are derived along with a Newton-Raphson iterative procedure for evaluation. A rigorous mathematical derivation of the limiting distribution of the maximum likelihood estimators is then presented using a characteristic function approach. An appendix with theorems on the asymptotic normality of sample sums when the observations are not identically distributed, with proofs, supports the presentation on asymptotic properties of the maximum likelihood estimators. Finally, two applications of the model are presented using data from the Hypertension Detection and Follow-up Program, a prospective, population-based, randomized trial of treatment for hypertension. The first application compares the risk of five-year mortality from cardiovascular causes with that from noncardiovascular causes; the second application compares risk factors for fatal or nonfatal coronary heart disease with those for fatal or nonfatal stroke. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is an obvious carcinogen for lung cancer. Since CBMN (Cytokinesis-blocked micronucleus) has been found to be extremely sensitive to NNK-induced genetic damage, it is a potential important factor to predict the lung cancer risk. However, the association between lung cancer and NNK-induced genetic damage measured by CBMN assay has not been rigorously examined. ^ This research develops a methodology to model the chromosomal changes under NNK-induced genetic damage in a logistic regression framework in order to predict the occurrence of lung cancer. Since these chromosomal changes were usually not observed very long due to laboratory cost and time, a resampling technique was applied to generate the Markov chain of the normal and the damaged cell for each individual. A joint likelihood between the resampled Markov chains and the logistic regression model including transition probabilities of this chain as covariates was established. The Maximum likelihood estimation was applied to carry on the statistical test for comparison. The ability of this approach to increase discriminating power to predict lung cancer was compared to a baseline "non-genetic" model. ^ Our method offered an option to understand the association between the dynamic cell information and lung cancer. Our study indicated the extent of DNA damage/non-damage using the CBMN assay provides critical information that impacts public health studies of lung cancer risk. This novel statistical method could simultaneously estimate the process of DNA damage/non-damage and its relationship with lung cancer for each individual.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective. To examine associations between parental monitoring and adolescent alcohol/drug use. ^ Methods. 981 7th grade students from 10 inner-city middle schools were surveyed at the 3 month follow-up of an HIV, STD, and pregnancy prevention program. Data from 549 control subjects were used for analyses. Multinomial logistic regression was used to examine associations between five parental monitoring variables and substance use, coded as: low risk [never drank alcohol or used drugs (0)], moderate risk [drank alcohol, no drug use (1)], and high risk [both drank alcohol and used drugs or just used drugs (2)]. ^ Results. Participants were 58.3% female, 39.6% African American, 43.8% Hispanic, mean age 13.3 years. Lifetime alcohol use was 47.9%. Lifetime drug use was 14.9%. Adjusted for gender, age, race, and family structure, each individual parental monitoring variable (perceived parental monitoring, less permissive parental monitoring, greater supervision (public places), greater supervision (teen clubs), and less time spent with older teens) was significant and protective for the moderate and high risk groups. When all 5 variables were entered into a single model, only perceived parental monitoring was significantly associated (OR=0.40, 95% CI 0.29-0.55) for the moderate risk group. For the high risk group, 3 variables were significantly protective (perceived parental monitoring OR=0.28, CI 0.18-0.42, less time spent with older teens OR=0.75, CI 0.60-0.93, and greater supervision (public places) OR=0.79, CI 0.64-0.99). ^ Conclusion. The association between parental monitoring and substance abuse is complex and varied for different risk levels. Implications for intervention development are addressed. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objectives: The purpose of this study is to understand the perceived effects of patient-dental staff communication and cultural diversity on the utilization of dental services in the U.S. by Saudi Arabian students who live in the U.S. and enrolled into the King Abdullah Scholarship program. Methods: The study design was an analytical cross-sectional study. Data for this study was obtained from the Saudi Dental Servicers Utilization Survey, a voluntary internet survey available online for one month through Facebook. Ordered logistic regression analyses and multinomial logistic regression analyses were used to measure the relationships between patient-dental staff communication and cultural diversity on the utilization of dental services. Results: Eight hundred and forty-seven responses were analyzed for this study. Overall, the majority of Saudi students reported having excellent communication experience with dental providers in the U.S. More than 58% of respondents reported at least one regular dental visit last year. Factors that influenced the use of regular dental care were: dentist's explanation of treatment plan, response of dental staff to patient's needs, respectful and polite dental staff, dental staff kindness, availability of up-to-date equipment, and overall communication with dentist. However, the utilization of emergency dental care was not associated with any measurement of patient-dental provider communication. Overall future utilization of dental care is associated with all aspects of patient-dental staff communication measured in this survey. Furthermore, more utilization of regular dental care was related to respondent's perception of the importance of trustworthiness dental staff and the importance of a dentist's reputation was only marginally associated. Respondent's perception of dentist's reputation was associated with more use of emergency dental services. Respondents are more likely to anticipate using dental care in the future if they perceived trustworthiness dental staff, and the dentist's reputation as influencing factors to their usage of dental services. Conclusions: Patient-dental staff communication was partially associated with utilization of regular dental care, not associated with utilization of emergency dental care, and broadly associated with anticipated future utilization of dental care. In addition, trustworthy dental staff, and a dentist's reputation were considered to be strong influencing factors towards utilization of dental services.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduction: The average age of onset of breast cancer among Hispanic women is 50 years, more than a decade earlier than non-Hispanic white women. Age at diagnosis is an important prognostic factor for breast cancer; younger age at onset is more likely to be associated with advanced disease, poorer prognosis, hormone receptor negative breast tumors, and a greater likelihood of hereditary breast cancer. Studies of breast cancer risk factors including reproductive risk factors, family history of breast cancer, and breast cancer subtype have been conducted predominately in non-Hispanic whites. Breast cancer is a heterogeneous disease with the presence of clinically, biologically, and epidemiologically distinct subtypes that also differ with respect to their risk factors. The associations between reproductive risk factors and family history of breast cancer have been well documented in the literature. However, only a few studies have assessed these associations with breast cancer subtype in Hispanic populations. Methods: To assess the associations between reproductive risk factors and family history of breast cancer we conducted three separate studies. First, we conducted a case-control study of 172 Mexican-American breast cancer cases and 344 age matched controls residing in Harris County, TX to assess reproductive and other risk factors. We conducted logistic regression analysis to assess differences in cases and controls adjusted for age at diagnosis and birthplace and then we conducted a multinomial logistic regression analysis to compare reproductive risk factors among the breast tumor subtypes. In a second study, we identified 139 breast cancer patients with a first- or second-degree family history of breast cancer and 298 without a family history from the ELLA Bi-National Breast Cancer Study. In this analysis, we also computed a multinomial logistic regression to evaluate associations between family history of breast cancer and breast cancer subtypes, and logistic regression to estimate associations between breast cancer screening practices with family history of breast cancer. In the final study, we employed a cross-sectional study design in 7279 Mexican-American women in the Mano a Mano Cohort Study. We evaluated associations with family history of breast cancer and breast cancer risk factors including body mass index (BMI), lifestyle factors, migration history, and adherence to American Cancer Society (ACS) guidelines. Results: In the results of our first analyses, reproductive risk factors differed in the magnitude and direction of associations when stratified by age and birthplace among cases and controls. In our second study, family history of breast cancer, and having at least one relative diagnosed at an early age (<50 years) was associated with triple negative breast cancer (TNBC). Mammography prior to receiving a breast cancer diagnosis was associated with family history of breast cancer. In our third study that assessed lifestyle factors, migration history and family history of breast cancer; we found that women with a first-degree family history of breast cancer were more overweight or obese compared with their counterparts without a family history. There was no indication that having a family history contributed to women practicing healthier lifestyle behaviors and/or adhering to the ACS guidelines for cancer prevention. Conclusions: We observed that among Mexican-American women, reproductive risk factors were associated with breast cancer where the woman was born (US or Mexico). Having a family history of breast cancer, especially having either a first- or second-degree relative diagnosed at a younger age, was strongly associated with TNBC subtype. These results are consistent with other published studies in this area. Further, our results indicate that women with strong family histories of breast cancer are more likely to undertake mammography but not to engage in healthier lifestyle behaviors.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In 2011, there will be an estimated 1,596,670 new cancer cases and 571,950 cancer-related deaths in the US. With the ever-increasing applications of cancer genetics in epidemiology, there is great potential to identify genetic risk factors that would help identify individuals with increased genetic susceptibility to cancer, which could be used to develop interventions or targeted therapies that could hopefully reduce cancer risk and mortality. In this dissertation, I propose to develop a new statistical method to evaluate the role of haplotypes in cancer susceptibility and development. This model will be flexible enough to handle not only haplotypes of any size, but also a variety of covariates. I will then apply this method to three cancer-related data sets (Hodgkin Disease, Glioma, and Lung Cancer). I hypothesize that there is substantial improvement in the estimation of association between haplotypes and disease, with the use of a Bayesian mathematical method to infer haplotypes that uses prior information from known genetics sources. Analysis based on haplotypes using information from publically available genetic sources generally show increased odds ratios and smaller p-values in both the Hodgkin, Glioma, and Lung data sets. For instance, the Bayesian Joint Logistic Model (BJLM) inferred haplotype TC had a substantially higher estimated effect size (OR=12.16, 95% CI = 2.47-90.1 vs. 9.24, 95% CI = 1.81-47.2) and more significant p-value (0.00044 vs. 0.008) for Hodgkin Disease compared to a traditional logistic regression approach. Also, the effect sizes of haplotypes modeled with recessive genetic effects were higher (and had more significant p-values) when analyzed with the BJLM. Full genetic models with haplotype information developed with the BJLM resulted in significantly higher discriminatory power and a significantly higher Net Reclassification Index compared to those developed with haplo.stats for lung cancer. Future analysis for this work could be to incorporate the 1000 Genomes project, which offers a larger selection of SNPs can be incorporated into the information from known genetic sources as well. Other future analysis include testing non-binary outcomes, like the levels of biomarkers that are present in lung cancer (NNK), and extending this analysis to full GWAS studies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^