944 resultados para Transitive Inferences


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Introduction. Cancer is the second most common cause of death in the USA (2). Studies have shown a coexistence of cancer and hypogonadism (9,31,13). The majority of patients with cancer develop cachexia, which cannot be solely explained by anorexia seen in these patients. Testosterone is a male sex hormone which is known to increase muscle mass and strength, maintain cancellous bone mass, and increase cortical bone mass, in addition to improving libido, sexual desire, and fantasy (14). If a high prevalence of hypogonadism is detected in male cancer patients, and a significant difference exists in testosterone levels in cancer patients with cachexia versus those without cachexia, testosterone may be administered in future randomized trials to help alleviate cachexia. Study group and design The study group consisted of male cancer patients and non-cancer controls aged between 40 and 70 years. The primary study design was cross-sectional with a sample size of 135. The present data analysis is done on a subset convenience sample of 72 patients recruited between November 2006 and January 2010. ^ Methods. Patients aged 40-70 years with or without a diagnosis of cancer were recruited into the study. All patients with a BMI over 35, significant edema, non-melanomatous skin cancer, current alcohol or illicit drug abuse, concomitant usage of medications interfering with gonadal axis, and anabolic agents, patients on tube feeds or parenteral nutrition within 3 months prior to enrollment were excluded from the study. The study was approved by the Institutional Review Board of Baylor College of Medicine and is being conducted at the Michael E. DeBakey Veterans Affairs Medical Center at Houston. My thesis is a pilot data analysis that employs a smaller subset convenience sample of 72 patients determined by using the data available for the 72 patients (of the intended sample of 135 patients) recruited between November 2006 and January 2010. The primary aim of this analysis is to compare the proportion of patients with hypogonadism in the male cancer and non-cancer control groups, and to evaluate if a significant difference exists with respect to testosterone levels in male cancer patients with cachexia versus those without cachexia. The procedures of the study relevant to the current data analysis included blood collection to measure levels of testosterone and measurement of body weight to categorize cancer patients into cancer cachexia and cancer non-cachexia sub-groups. ^ Results. After logarithmic transformation of data of cancer and control groups, the unpaired t test with unequal variances was done. The proportion of patients with hypogonadism in the male cancer and non-cancer control groups was 47.5% and 22.7% with a Pearson chi2 statistic of 1.6036 and a p value of 0.205. Comparing the mean calculated Bioavailable testosterone in male cancer patients and non-cancer controls resulted in a t statistic of 21.83 and a p value less than 0.001. When the cancer group alone was taken, the mean free testosterone, calculated bioavailable testosterone and total testosterone levels in the cancer non-cachexia sub-group were 3.93, 5.09, 103.51 respectively and in the cancer cachexia sub-group were 3.58, 4.17, 84.08 respectively. The unpaired t test with equal variances showed that the two sub-groups had p values of 0.2015, 0.1842, and 0.4894 with respect to calculated bioavailable testosterone, free testosterone, and total testosterone respectively. ^ Conclusions. The small sample size of this exploratory study, resulting in a small power, does not allow us to draw definitive conclusions. For the given sub-sample, the proportion of patients with hypogonadism in the cancer group was not significantly different from that of patients with hypogonadism in the control group. Inferences on prevalence of hypogonadism in male cancer patients could not be made in this paper as the sub-sample is small and therefore not representative of the general population. However, there was a statistically significant difference in calculated Bioavailable testosterone levels in male cancer patients versus non-cancer controls. Analysis of cachectic and non-cachectic patients within the male cancer group showed no significant difference in testosterone levels (total, free, and calculated bioavailable testosterone) between both sub-groups. However, to re-iterate, this study is exploratory and the results may change once the complete dataset is obtained and analyzed. It however serves as a good template to guide further research and analysis.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In prospective studies it is essential that the study sample accurately represents the target population for meaningful inferences to be drawn. Understanding why some individuals do not participate, or fail to continue to participate, in longitudinal studies can provide an empirical basis for the development of effective recruitment and retention strategies to improve response rates. This study examined the influence of social connectedness and self-esteem on long-term retention of participants, using secondary data from the “San Antonio Longitudinal Study of Aging” (SALSA), a population-based study of Mexican Americans (MAs) and European Americans (EAs) aged over 65 years residing in San Antonio, Texas. We tested the effect of social connectedness, self-esteem and socioeconomic status on participant retention in both ethnic groups. In MAs only, we analyzed whether acculturation and assimilation moderated these associations and/or had a direct effect on participant retention. ^ Low income, low frequency of social contacts and length of recruitment interval were significant predictors of non-completer status. Participants with low levels of social contacts were almost twice as likely as those with high levels of social contacts to be non-completers, even after adjustment for age, sex, ethnic group, education, household income, and recruitment interval (OR = 1.95, 95% CI: 1.26–3.01, p = 0.003). Recruitment interval consistently and strongly predicted non-completer status in all the models tested. Depending on the model, for each year beyond baseline there was a 25–33% greater likelihood of non-completion. The only significant interaction, or moderating, effect observed was between social contacts and cultural values among MAs. Specifically, MAs with both low social contacts and low acculturation on cultural values (i.e., placed high value on preserving Mexican cultural origins) were three and half times more likely to be non-completers compared with MAs in other subgroups comprised of the combination of these variables, even after adjustment for covariates. ^ Long term studies with older and minority participants are challenging for participant retention. Strategies can be designed to enhance retention by paying special attention to participants with low social contacts and, in MAs, participants with both low social contacts and low acculturation on cultural values. Minimizing the time interval between baseline and follow-up recruitment, and maintaining frequent contact with participants during this interval should also be is integral to the study design.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). The quality of the inferences about copy number can be affected by many factors including batch effects, DNA sample preparation, signal processing, and analytical approach. Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP genotyping data. However, these algorithms lack specificity to detect small CNVs due to the high false positive rate when calling CNVs based on the intensity values. Association tests based on detected CNVs therefore lack power even if the CNVs affecting disease risk are common. In this research, by combining an existing Hidden Markov Model (HMM) and the logistic regression model, a new genome-wide logistic regression algorithm was developed to detect CNV associations with diseases. We showed that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than an existing popular algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The standard analyses of survival data involve the assumption that survival and censoring are independent. When censoring and survival are related, the phenomenon is known as informative censoring. This paper examines the effects of an informative censoring assumption on the hazard function and the estimated hazard ratio provided by the Cox model.^ The limiting factor in all analyses of informative censoring is the problem of non-identifiability. Non-identifiability implies that it is impossible to distinguish a situation in which censoring and death are independent from one in which there is dependence. However, it is possible that informative censoring occurs. Examination of the literature indicates how others have approached the problem and covers the relevant theoretical background.^ Three models are examined in detail. The first model uses conditionally independent marginal hazards to obtain the unconditional survival function and hazards. The second model is based on the Gumbel Type A method for combining independent marginal distributions into bivariate distributions using a dependency parameter. Finally, a formulation based on a compartmental model is presented and its results described. For the latter two approaches, the resulting hazard is used in the Cox model in a simulation study.^ The unconditional survival distribution formed from the first model involves dependency, but the crude hazard resulting from this unconditional distribution is identical to the marginal hazard, and inferences based on the hazard are valid. The hazard ratios formed from two distributions following the Gumbel Type A model are biased by a factor dependent on the amount of censoring in the two populations and the strength of the dependency of death and censoring in the two populations. The Cox model estimates this biased hazard ratio. In general, the hazard resulting from the compartmental model is not constant, even if the individual marginal hazards are constant, unless censoring is non-informative. The hazard ratio tends to a specific limit.^ Methods of evaluating situations in which informative censoring is present are described, and the relative utility of the three models examined is discussed. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Statistical methods are developed which assess survival data for two attributes; (1) prolongation of life, (2) quality of life. Health state transition probabilities correspond to prolongation of life and are modeled as a discrete-time semi-Markov process. Imbedded within the sojourn time of a particular health state are the quality of life transitions. They reflect events which differentiate perceptions of pain and suffering over a fixed time period. Quality of life transition probabilities are derived from the assumptions of a simple Markov process. These probabilities depend on the health state currently occupied and the next health state to which a transition is made. Utilizing the two forms of attributes the model has the capability to estimate the distribution of expected quality adjusted life years (in addition to the distribution of expected survival times). The expected quality of life can also be estimated within the health state sojourn time making more flexible the assessment of utility preferences. The methods are demonstrated on a subset of follow-up data from the Beta Blocker Heart Attack Trial (BHAT). This model contains the structure necessary to make inferences when assessing a general survival problem with a two dimensional outcome. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A Bayesian approach to estimation of the regression coefficients of a multinominal logit model with ordinal scale response categories is presented. A Monte Carlo method is used to construct the posterior distribution of the link function. The link function is treated as an arbitrary scalar function. Then the Gauss-Markov theorem is used to determine a function of the link which produces a random vector of coefficients. The posterior distribution of the random vector of coefficients is used to estimate the regression coefficients. The method described is referred to as a Bayesian generalized least square (BGLS) analysis. Two cases involving multinominal logit models are described. Case I involves a cumulative logit model and Case II involves a proportional-odds model. All inferences about the coefficients for both cases are described in terms of the posterior distribution of the regression coefficients. The results from the BGLS method are compared to maximum likelihood estimates of the regression coefficients. The BGLS method avoids the nonlinear problems encountered when estimating the regression coefficients of a generalized linear model. The method is not complex or computationally intensive. The BGLS method offers several advantages over Bayesian approaches. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In geographical epidemiology, maps of disease rates and disease risk provide a spatial perspective for researching disease etiology. For rare diseases or when the population base is small, the rate and risk estimates may be unstable. Empirical Bayesian (EB) methods have been used to spatially smooth the estimates by permitting an area estimate to "borrow strength" from its neighbors. Such EB methods include the use of a Gamma model, of a James-Stein estimator, and of a conditional autoregressive (CAR) process. A fully Bayesian analysis of the CAR process is proposed. One advantage of this fully Bayesian analysis is that it can be implemented simply by using repeated sampling from the posterior densities. Use of a Markov chain Monte Carlo technique such as Gibbs sampler was not necessary. Direct resampling from the posterior densities provides exact small sample inferences instead of the approximate asymptotic analyses of maximum likelihood methods (Clayton & Kaldor, 1987). Further, the proposed CAR model provides for covariates to be included in the model. A simulation demonstrates the effect of sample size on the fully Bayesian analysis of the CAR process. The methods are applied to lip cancer data from Scotland, and the results are compared. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mixed longitudinal designs are important study designs for many areas of medical research. Mixed longitudinal studies have several advantages over cross-sectional or pure longitudinal studies, including shorter study completion time and ability to separate time and age effects, thus are an attractive choice. Statistical methodology used in general longitudinal studies has been rapidly developing within the last few decades. Common approaches for statistical modeling in studies with mixed longitudinal designs have been the linear mixed-effects model incorporating an age or time effect. The general linear mixed-effects model is considered an appropriate choice to analyze repeated measurements data in longitudinal studies. However, common use of linear mixed-effects model on mixed longitudinal studies often incorporates age as the only random-effect but fails to take into consideration the cohort effect in conducting statistical inferences on age-related trajectories of outcome measurements. We believe special attention should be paid to cohort effects when analyzing data in mixed longitudinal designs with multiple overlapping cohorts. Thus, this has become an important statistical issue to address. ^ This research aims to address statistical issues related to mixed longitudinal studies. The proposed study examined the existing statistical analysis methods for the mixed longitudinal designs and developed an alternative analytic method to incorporate effects from multiple overlapping cohorts as well as from different aged subjects. The proposed study used simulation to evaluate the performance of the proposed analytic method by comparing it with the commonly-used model. Finally, the study applied the proposed analytic method to the data collected by an existing study Project HeartBeat!, which had been evaluated using traditional analytic techniques. Project HeartBeat! is a longitudinal study of cardiovascular disease (CVD) risk factors in childhood and adolescence using a mixed longitudinal design. The proposed model was used to evaluate four blood lipids adjusting for age, gender, race/ethnicity, and endocrine hormones. The result of this dissertation suggest the proposed analytic model could be a more flexible and reliable choice than the traditional model in terms of fitting data to provide more accurate estimates in mixed longitudinal studies. Conceptually, the proposed model described in this study has useful features, including consideration of effects from multiple overlapping cohorts, and is an attractive approach for analyzing data in mixed longitudinal design studies.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An interim analysis is usually applied in later phase II or phase III trials to find convincing evidence of a significant treatment difference that may lead to trial termination at an earlier point than planned at the beginning. This can result in the saving of patient resources and shortening of drug development and approval time. In addition, ethics and economics are also the reasons to stop a trial earlier. In clinical trials of eyes, ears, knees, arms, kidneys, lungs, and other clustered treatments, data may include distribution-free random variables with matched and unmatched subjects in one study. It is important to properly include both subjects in the interim and the final analyses so that the maximum efficiency of statistical and clinical inferences can be obtained at different stages of the trials. So far, no publication has applied a statistical method for distribution-free data with matched and unmatched subjects in the interim analysis of clinical trials. In this simulation study, the hybrid statistic was used to estimate the empirical powers and the empirical type I errors among the simulated datasets with different sample sizes, different effect sizes, different correlation coefficients for matched pairs, and different data distributions, respectively, in the interim and final analysis with 4 different group sequential methods. Empirical powers and empirical type I errors were also compared to those estimated by using the meta-analysis t-test among the same simulated datasets. Results from this simulation study show that, compared to the meta-analysis t-test commonly used for data with normally distributed observations, the hybrid statistic has a greater power for data observed from normally, log-normally, and multinomially distributed random variables with matched and unmatched subjects and with outliers. Powers rose with the increase in sample size, effect size, and correlation coefficient for the matched pairs. In addition, lower type I errors were observed estimated by using the hybrid statistic, which indicates that this test is also conservative for data with outliers in the interim analysis of clinical trials.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the biomedical studies, the general data structures have been the matched (paired) and unmatched designs. Recently, many researchers are interested in Meta-Analysis to obtain a better understanding from several clinical data of a medical treatment. The hybrid design, which is combined two data structures, may create the fundamental question for statistical methods and the challenges for statistical inferences. The applied methods are depending on the underlying distribution. If the outcomes are normally distributed, we would use the classic paired and two independent sample T-tests on the matched and unmatched cases. If not, we can apply Wilcoxon signed rank and rank sum test on each case. ^ To assess an overall treatment effect on a hybrid design, we can apply the inverse variance weight method used in Meta-Analysis. On the nonparametric case, we can use a test statistic which is combined on two Wilcoxon test statistics. However, these two test statistics are not in same scale. We propose the Hybrid Test Statistic based on the Hodges-Lehmann estimates of the treatment effects, which are medians in the same scale.^ To compare the proposed method, we use the classic meta-analysis T-test statistic on the combined the estimates of the treatment effects from two T-test statistics. Theoretically, the efficiency of two unbiased estimators of a parameter is the ratio of their variances. With the concept of Asymptotic Relative Efficiency (ARE) developed by Pitman, we show ARE of the hybrid test statistic relative to classic meta-analysis T-test statistic using the Hodges-Lemann estimators associated with two test statistics.^ From several simulation studies, we calculate the empirical type I error rate and power of the test statistics. The proposed statistic would provide effective tool to evaluate and understand the treatment effect in various public health studies as well as clinical trials.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Schizophrenia (SZ) is a complex disorder with high heritability and variable phenotypes that has limited success in finding causal genes associated with the disease development. Pathway-based analysis is an effective approach in investigating the molecular mechanism of susceptible genes associated with complex diseases. The etiology of complex diseases could be a network of genetic factors and within the genes, interaction may occur. In this work we argue that some genes might be of small effect that by itself are neither sufficient nor necessary to cause the disease however, their effect may induce slight changes to the gene expression or affect the protein function, therefore, analyzing the gene-gene interaction mechanism within the disease pathway would play crucial role in dissecting the genetic architecture of complex diseases, making the pathway-based analysis a complementary approach to GWAS technique. ^ In this study, we implemented three novel linkage disequilibrium based statistics, the linear combination, the quadratic, and the decorrelation test statistics, to investigate the interaction between linked and unlinked genes in two independent case-control GWAS datasets for SZ including participants of European (EA) and African (AA) ancestries. The EA population included 1,173 cases and 1,378 controls with 729,454 genotyped SNPs, while the AA population included 219 cases and 288 controls with 845,814 genotyped SNPs. We identified 17,186 interacting gene-sets at significant level in EA dataset, and 12,691 gene-sets in AA dataset using the gene-gene interaction method. We also identified 18,846 genes in EA dataset and 19,431 genes in AA dataset that were in the disease pathways. However, few genes were reported of significant association to SZ. ^ Our research determined the pathways characteristics for schizophrenia through the gene-gene interaction and gene-pathway based approaches. Our findings suggest insightful inferences of our methods in studying the molecular mechanisms of common complex diseases.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is well accepted that tumorigenesis is a multi-step procedure involving aberrant functioning of genes regulating cell proliferation, differentiation, apoptosis, genome stability, angiogenesis and motility. To obtain a full understanding of tumorigenesis, it is necessary to collect information on all aspects of cell activity. Recent advances in high throughput technologies allow biologists to generate massive amounts of data, more than might have been imagined decades ago. These advances have made it possible to launch comprehensive projects such as (TCGA) and (ICGC) which systematically characterize the molecular fingerprints of cancer cells using gene expression, methylation, copy number, microRNA and SNP microarrays as well as next generation sequencing assays interrogating somatic mutation, insertion, deletion, translocation and structural rearrangements. Given the massive amount of data, a major challenge is to integrate information from multiple sources and formulate testable hypotheses. This thesis focuses on developing methodologies for integrative analyses of genomic assays profiled on the same set of samples. We have developed several novel methods for integrative biomarker identification and cancer classification. We introduce a regression-based approach to identify biomarkers predictive to therapy response or survival by integrating multiple assays including gene expression, methylation and copy number data through penalized regression. To identify key cancer-specific genes accounting for multiple mechanisms of regulation, we have developed the integIRTy software that provides robust and reliable inferences about gene alteration by automatically adjusting for sample heterogeneity as well as technical artifacts using Item Response Theory. To cope with the increasing need for accurate cancer diagnosis and individualized therapy, we have developed a robust and powerful algorithm called SIBER to systematically identify bimodally expressed genes using next generation RNAseq data. We have shown that prediction models built from these bimodal genes have the same accuracy as models built from all genes. Further, prediction models with dichotomized gene expression measurements based on their bimodal shapes still perform well. The effectiveness of outcome prediction using discretized signals paves the road for more accurate and interpretable cancer classification by integrating signals from multiple sources.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis project is motivated by the potential problem of using observational data to draw inferences about a causal relationship in observational epidemiology research when controlled randomization is not applicable. Instrumental variable (IV) method is one of the statistical tools to overcome this problem. Mendelian randomization study uses genetic variants as IVs in genetic association study. In this thesis, the IV method, as well as standard logistic and linear regression models, is used to investigate the causal association between risk of pancreatic cancer and the circulating levels of soluble receptor for advanced glycation end-products (sRAGE). Higher levels of serum sRAGE were found to be associated with a lower risk of pancreatic cancer in a previous observational study (255 cases and 485 controls). However, such a novel association may be biased by unknown confounding factors. In a case-control study, we aimed to use the IV approach to confirm or refute this observation in a subset of study subjects for whom the genotyping data were available (178 cases and 177 controls). Two-stage IV method using generalized method of moments-structural mean models (GMM-SMM) was conducted and the relative risk (RR) was calculated. In the first stage analysis, we found that the single nucleotide polymorphism (SNP) rs2070600 of the receptor for advanced glycation end-products (AGER) gene meets all three general assumptions for a genetic IV in examining the causal association between sRAGE and risk of pancreatic cancer. The variant allele of SNP rs2070600 of the AGER gene was associated with lower levels of sRAGE, and it was neither associated with risk of pancreatic cancer, nor with the confounding factors. It was a potential strong IV (F statistic = 29.2). However, in the second stage analysis, the GMM-SMM model failed to converge due to non- concaveness probably because of the small sample size. Therefore, the IV analysis could not support the causality of the association between serum sRAGE levels and risk of pancreatic cancer. Nevertheless, these analyses suggest that rs2070600 was a potentially good genetic IV for testing the causality between the risk of pancreatic cancer and sRAGE levels. A larger sample size is required to conduct a credible IV analysis.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Surgical site infections (SSIs) after abdominal surgeries account for approximately 26% of all reported SSIs. The Center for Disease Control and Prevention (CDC) defines 3 types of SSIs: superficial incisional, deep incisional, and organ/space. Preventing SSIs has become a national focus. This dissertation assesses several associations with the individual types of SSI in patients that have undergone colon surgery. ^ Methods: Data for this dissertation was obtained from the American College of Surgeons' National Surgical Quality Improvement Program (NSQIP); major colon surgeries were identified in the database that occurred between the time period of 2007 and 2009. NSQIP data includes more than 50 preoperative and 30 intraoperative factors; 40 collected postoperative occurrences are based on a follow-up period of 30 days from surgery. Initially, four individual logistic regressions were modeled to compare the associations between risk factors and each of the SSI groups: superficial, deep, organ/space and a composite of any single SSI. A second analysis used polytomous regression to assess simultaneously the associations between risk factors and the different types of SSIs, as well as, formally test the different effect estimates of 13 common risk factors for SSIs. The final analysis explored the association between venous thromboembolism (VTEs) and the different types of SSIs and risk factors. ^ Results: A total of 59,365 colon surgeries were included in the study. Overall, 13% of colon cases developed a single type of SSI; 8% of these were superficial SSIs, 1.4% was deep SSIs, and 3.8% were organ/space SSIs. The first article identifies the unique set of risk factors associated with each of the 4 SSI models. Distinct risk factors for superficial SSIs included factors, such as alcohol, chronic obstructive pulmonary disease, dyspnea and diabetes. Organ/space SSIs were uniquely associated with disseminated cancer, preoperative dialysis, preoperative radiation treatment, bleeding disorder and prior surgery. Risk factors that were significant in all models had different effect estimates. The second article assesses 13 common SSI risk factors simultaneously across the 3 different types of SSIs using polytomous regression. Then each risk factor was formally tested for the effect heterogeneity exhibited. If the test was significant the final model would allow for the effect estimations for that risk factor to vary across each type of SSI; if the test was not significant, the effect estimate would remain constant across the types of SSIs using the aggregate SSI value. The third article explored the relationship of venous thromboembolism (VTE) and the individual types of SSIs and risk factors. The overall incidence of VTEs after the 59,365 colon cases was 2.4%. All 3 types of SSIs and several risk factors were independently associated with the development of VTEs. ^ Conclusions: Risk factors associated with each type of SSI were different in patients that have undergone colon surgery. Each model had a unique cluster of risk factors. Several risk factors, including increased BMI, duration of surgery, wound class, and laparoscopic approach, were significant across all 4 models but no statistical inferences can be made about their different effect estimates. These results suggest that aggregating SSIs may misattribute and hide true associations with risk factors. Using polytomous regression to assess multiple risk factors with the multiple types of SSI, this study was able to identify several risk factors that had significant effect heterogeneity across the 3 types of SSI challenging the use of aggregate SSI outcomes. The third article recognizes the strong association between VTEs and the 3 types of SSIs. Clinicians understand the difference between superficial, deep and organ/space SSIs. Our results indicate that they should be considered individually in future studies.^