Biblioteca Digital

14 resultados para model selection in binary regression

em DigitalCommons@The Texas Medical Center

An empirical evaluation of the Random Forests classifier models for variable selection in a large-scale lung cancer case-control study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

Veja mais

THE IMPROVEMENT REGION IN RIDGE REGRESSION

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the difficulties in the practical application of ridge regression is that, for a given data set, it is unknown whether a selected ridge estimator has smaller squared error than the least squares estimator. The concept of the improvement region is defined, and a technique is developed which obtains approximate confidence intervals for the value of ridge k which produces the maximum reduction in mean squared error. Two simulation experiments were conducted to investigate how accurate these approximate confidence intervals might be. ^

Veja mais

Selection bias and the cross-validation of regression models for prediction

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

Veja mais

The number and timing of time -dependent covariate measurements for the Cox regression model

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of analyzing data with updated measurements in the time-dependent proportional hazards model arises frequently in practice. One available option is to reduce the number of intervals (or updated measurements) to be included in the Cox regression model. We empirically investigated the bias of the estimator of the time-dependent covariate while varying the effect of failure rate, sample size, true values of the parameters and the number of intervals. We also evaluated how often a time-dependent covariate needs to be collected and assessed the effect of sample size and failure rate on the power of testing a time-dependent effect.^ A time-dependent proportional hazards model with two binary covariates was considered. The time axis was partitioned into k intervals. The baseline hazard was assumed to be 1 so that the failure times were exponentially distributed in the ith interval. A type II censoring model was adopted to characterize the failure rate. The factors of interest were sample size (500, 1000), type II censoring with failure rates of 0.05, 0.10, and 0.20, and three values for each of the non-time-dependent and time-dependent covariates (1/4,1/2,3/4).^ The mean of the bias of the estimator of the coefficient of the time-dependent covariate decreased as sample size and number of intervals increased whereas the mean of the bias increased as failure rate and true values of the covariates increased. The mean of the bias of the estimator of the coefficient was smallest when all of the updated measurements were used in the model compared with two models that used selected measurements of the time-dependent covariate. For the model that included all the measurements, the coverage rates of the estimator of the coefficient of the time-dependent covariate was in most cases 90% or more except when the failure rate was high (0.20). The power associated with testing a time-dependent effect was highest when all of the measurements of the time-dependent covariate were used. An example from the Systolic Hypertension in the Elderly Program Cooperative Research Group is presented. ^

Veja mais

Assessment of the effect on statistical power of regression model misspecification by using techniques of mathematical statistics and simulation study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^

Veja mais

Uptake And Metabolism Of 5’-AMP In The Erythrocyte Play Key Roles In The 5’-AMP Induced Model Of Deep Hypometabolism

Relevância:

100.00% 100.00%

Publicador:

Resumo:

UPTAKE AND METABOLISM OF 5’-AMP IN THE ERYTHROCYTE PLAY KEY ROLES IN THE 5’-AMP INDUCED MODEL OF DEEP HYPOMETABOLISM Publication No. ________ Isadora Susan Daniels, B.A. Supervisory Professor: Cheng Chi Lee, Ph.D. Mechanisms that initiate and control the natural hypometabolic states of mammals are poorly understood. The laboratory developed a model of deep hypometabolism (DH) initiated by uptake of 5’-adenosine monophosphate (5’-AMP) into erythrocytes. Mice enter DH when given a high dose of 5’-AMP and the body cools readily. Influx of 5’-AMP appears to inhibit thermoregulatory control. In a 15°C environment, mice injected with 5’-AMP (0.5 mg/gw) enter a Phase I response in which oxygen consumption (VO2) drops rapidly to 1/3rd of euthermic levels. The Phase I response appears independent of body temperature (Tb). This is followed by gradual body temperature decline that correlates with VO2 decline, called Phase II response. Within 90 minutes, mouse Tb approaches 15°C, and VO2 is 1/10th of normal. Mice can remain several hours in this state, before gradually and safely recovering. The DH state translates to other mammalian species. Our studies show uptake and metabolism of 5’-AMP in erythrocytes causes biochemical changes that initiate DH. Increased AMP shifts the adenylate equilibrium toward ADP formation, consequently decreasing intracellular ATP. In turn, glycolysis slows, indicated by increased glucose and decreased lactate. 2,3-bisphosphoglycerate levels rise, allosterically reducing oxygen affinity for hemoglobin, and deoxyhemoglobin rises. Less oxygen transport to tissues likely triggers the DH model. The major intracellular pathway for AMP catabolism is catalyzed by AMP deaminase (AMPD). Multiple AMPD isozymes are expressed in various tissues, but erythrocytes only have AMPD3. Mice lacking AMPD3 were created to study control of the DH model, specifically in erythrocytes. Telemetric measurements demonstrate lower Tb and difficulty maintaining Tb under moderate metabolic stress. A more dramatic response to lower dose of 5’-AMP suggests AMPD activity in the erythrocyte plays an important role in control of the DH model. Analysis of adenylates in erythrocyte lysate shows 3-fold higher levels of ATP and ADP but similar AMP levels to wild-type. Taken together, results indicate alterations in energy status of erythrocytes can induce a hypometabolic state. AMPD3 control of AMP catabolism is important in controlling the DH model. Genetically reducing AMP catabolism in erythrocytes causes a phenotype of lower Tb and compromised ability to maintain temperature homeostasis.

Veja mais

A comparison of Markov and generalized linear models of hospital mortality

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reports a comparison of three modeling strategies for the analysis of hospital mortality in a sample of general medicine inpatients in a Department of Veterans Affairs medical center. Logistic regression, a Markov chain model, and longitudinal logistic regression were evaluated on predictive performance as measured by the c-index and on accuracy of expected numbers of deaths compared to observed. The logistic regression used patient information collected at admission; the Markov model was comprised of two absorbing states for discharge and death and three transient states reflecting increasing severity of illness as measured by laboratory data collected during the hospital stay; longitudinal regression employed Generalized Estimating Equations (GEE) to model covariance structure for the repeated binary outcome. Results showed that the logistic regression predicted hospital mortality as well as the alternative methods but was limited in scope of application. The Markov chain provides insights into how day to day changes of illness severity lead to discharge or death. The longitudinal logistic regression showed that increasing illness trajectory is associated with hospital mortality. The conclusion is reached that for standard applications in modeling hospital mortality, logistic regression is adequate, but for new challenges facing health services research today, alternative methods are equally predictive, practical, and can provide new insights. ^

Veja mais

A model of factors determining levels of public support for health care policy in the United States

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Public preferences for policy are formed in a little-understood process that is not adequately described by traditional economic theory of choice. In this paper I suggest that U.S. aggregate support for health reform can be modeled as tradeoffs among a small number of behavioral values and the stage of policy development. The theory underlying the model is based on Samuelson, et al.'s (1986) work and Wilke's (1991) elaboration of it as the Greed/Efficiency/Fairness (GEF) hypothesis of motivation in the management of resource dilemmas, and behavioral economics informed by Kahneman and Thaler's prospect theory. ^ The model developed in this paper employs ordered probit econometric techniques applied to data derived from U.S. polls taken from 1990 to mid-2003 that measured support for health reform proposals. Outcome data are four-tiered Likert counts; independent variables are dummies representing the presence or absence of operationalizations of each behavioral variable, along with an integer representing policy process stage. Marginal effects of each independent variable predict how support levels change on triggering that variable. Model estimation results indicate a vanishingly small likelihood that all coefficients are zero and all variables have signs expected from model theory. ^ Three hypotheses were tested: support will drain from health reform policy as it becomes increasingly well-articulated and approaches enactment; reforms appealing to fairness through universal health coverage will enjoy a higher degree of support than those targeted more narrowly; health reforms calling for government operation of the health finance system will achieve lower support than those that do not. Model results support the first and last hypotheses. Contrary to expectations, universal health care proposals did not provide incremental support beyond those targeted to “deserving” populations—children, elderly, working families. In addition, loss of autonomy (e.g. restrictions on choice of care giver) is found to be the “third rail” of health reform with significantly-reduced support. When applied to a hypothetical health reform in which an employer-mandated Medical Savings Account policy is the centerpiece, the model predicts support that may be insufficient to enactment. These results indicate that the method developed in the paper may prove valuable to health policy designers. ^

Veja mais

Genetic variants in the mammalian target of rapamycin (mTOR) signaling pathway as predictors of survival and clinical response in women with ovarian cancer

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background. The mTOR pathway is commonly altered in human tumors and promotes cell survival and proliferation. Preliminary evidence suggests this pathway's involvement in chemoresistance to platinum and taxanes, first line therapy for epithelial ovarian cancer. A pathway-based approach was used to identify individual germline single nucleotide polymorphisms (SNPs) and cumulative effects of multiple genetic variants in mTOR pathway genes and their association with clinical outcome in women with ovarian cancer. ^ Methods. The case-series was restricted to 319 non-Hispanic white women with high grade ovarian cancer treated with surgery and platinum-based chemotherapy. 135 SNPs in 20 representative genes in the mTOR pathway were genotyped. Hazard ratios (HRs) for death and Odds ratios (ORs) for failure to respond to primary therapy were estimated for each SNP using the multivariate Cox proportional hazards model and multivariate logistic regression model, respectively, while adjusting for age, stage, histology and treatment sequence. A survival tree analysis of SNPs with a statistically significant association (p<0.05) was performed to identify higher order gene-gene interactions and their association with overall survival. ^ Results. There was no statistically significant difference in survival by tumor histology or treatment regimen. The median survival for the cohort was 48.3 months. Seven SNPs were significantly associated with decreased survival. Compared to those with no unfavorable genotypes, the HR for death increased significantly with the increasing number of unfavorable genotypes and women in the highest risk category had HR of 4.06 (95% CI 2.29–7.21). The survival tree analysis also identified patients with different survival patterns based on their genetic profiles. 13 SNPs on five different genes were found to be significantly associated with a treatment response, defined as no evidence of disease after completion of primary therapy. Rare homozygous genotype of SNP rs6973428 showed a 5.5-fold increased risk compared to the wild type carrying genotypes. In the cumulative effect analysis, the highest risk group (individuals with ≥8 unfavorable genotypes) was significantly less likely to respond to chemotherapy (OR=8.40, 95% CI 3.10–22.75) compared to the low risk group (≤4 unfavorable genotypes). ^ Conclusions. A pathway-based approach can demonstrate cumulative effects of multiple genetic variants on clinical response to chemotherapy and survival. Therapy targeting the mTOR pathway may modify outcome in select patients.^

Veja mais

UNDERSTANDING INTERRUPTIONS IN HEALTHCARE: DEVELOPING A MODEL

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Developing a Model Interruption is a known human factor that contributes to errors and catastrophic events in healthcare as well as other high-risk industries. The landmark Institute of Medicine (IOM) report, To Err is Human, brought attention to the significance of preventable errors in medicine and suggested that interruptions could be a contributing factor. Previous studies of interruptions in healthcare did not offer a conceptual model by which to study interruptions. As a result of the serious consequences of interruptions investigated in other high-risk industries, there is a need to develop a model to describe, understand, explain, and predict interruptions and their consequences in healthcare. Therefore, the purpose of this study was to develop a model grounded in the literature and to use the model to describe and explain interruptions in healthcare. Specifically, this model would be used to describe and explain interruptions occurring in a Level One Trauma Center. A trauma center was chosen because this environment is characterized as intense, unpredictable, and interrupt-driven. The first step in developing the model began with a review of the literature which revealed that the concept interruption did not have a consistent definition in either the healthcare or non-healthcare literature. Walker and Avant’s method of concept analysis was used to clarify and define the concept. The analysis led to the identification of five defining attributes which include (1) a human experience, (2) an intrusion of a secondary, unplanned, and unexpected task, (3) discontinuity, (4) externally or internally initiated, and (5) situated within a context. However, before an interruption could commence, five conditions known as antecedents must occur. For an interruption to take place (1) an intent to interrupt is formed by the initiator, (2) a physical signal must pass a threshold test of detection by the recipient, (3) the sensory system of the recipient is stimulated to respond to the initiator, (4) an interruption task is presented to recipient, and (5) the interruption task is either accepted or rejected by v the recipient. An interruption was determined to be quantifiable by (1) the frequency of occurrence of an interruption, (2) the number of times the primary task has been suspended to perform an interrupting task, (3) the length of time the primary task has been suspended, and (4) the frequency of returning to the primary task or not returning to the primary task. As a result of the concept analysis, a definition of an interruption was derived from the literature. An interruption is defined as a break in the performance of a human activity initiated internal or external to the recipient and occurring within the context of a setting or location. This break results in the suspension of the initial task by initiating the performance of an unplanned task with the assumption that the initial task will be resumed. The definition is inclusive of all the defining attributes of an interruption. This is a standard definition that can be used by the healthcare industry. From the definition, a visual model of an interruption was developed. The model was used to describe and explain the interruptions recorded for an instrumental case study of physicians and registered nurses (RNs) working in a Level One Trauma Center. Five physicians were observed for a total of 29 hours, 31 minutes. Eight registered nurses were observed for a total of 40 hours 9 minutes. Observations were made on either the 0700–1500 or the 1500-2300 shift using the shadowing technique. Observations were recorded in the field note format. The field notes were analyzed by a hybrid method of categorizing activities and interruptions. The method was developed by using both a deductive a priori classification framework and by the inductive process utilizing line-byline coding and constant comparison as stated in Grounded Theory. The following categories were identified as relative to this study: Intended Recipient - the person to be interrupted Unintended Recipient - not the intended recipient of an interruption; i.e., receiving a phone call that was incorrectly dialed Indirect Recipient – the incidental recipient of an interruption; i.e., talking with another, thereby suspending the original activity Recipient Blocked – the intended recipient does not accept the interruption Recipient Delayed – the intended recipient postpones an interruption Self-interruption – a person, independent of another person, suspends one activity to perform another; i.e., while walking, stops abruptly and talks to another person Distraction – briefly disengaging from a task Organizational Design – the physical layout of the workspace that causes a disruption in workflow Artifacts Not Available – supplies and equipment that are not available in the workspace causing a disruption in workflow Initiator – a person who initiates an interruption Interruption by Organizational Design and Artifacts Not Available were identified as two new categories of interruption. These categories had not previously been cited in the literature. Analysis of the observations indicated that physicians were found to perform slightly fewer activities per hour when compared to RNs. This variance may be attributed to differing roles and responsibilities. Physicians were found to have more activities interrupted when compared to RNs. However, RNs experienced more interruptions per hour. Other people were determined to be the most commonly used medium through which to deliver an interruption. Additional mediums used to deliver an interruption vii included the telephone, pager, and one’s self. Both physicians and RNs were observed to resume an original interrupted activity more often than not. In most interruptions, both physicians and RNs performed only one or two interrupting activities before returning to the original interrupted activity. In conclusion the model was found to explain all interruptions observed during the study. However, the model will require an even more comprehensive study in order to establish its predictive value.

Veja mais

BAYESIAN STATISTICAL METHODS IN GENE-ENVIRONMENT AND GENE-GENE INTERACTION STUDIES

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.

Veja mais

Age -period -cohort analysis: An extension of Cox's lifetable regression with time -dependent covariates

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is well known that an identification problem exists in the analysis of age-period-cohort data because of the relationship among the three factors (date of birth + age at death = date of death). There are numerous suggestions about how to analyze the data. No one solution has been satisfactory. The purpose of this study is to provide another analytic method by extending the Cox's lifetable regression model with time-dependent covariates. The new approach contains the following features: (1) It is based on the conditional maximum likelihood procedure using a proportional hazard function described by Cox (1972), treating the age factor as the underlying hazard to estimate the parameters for the cohort and period factors. (2) The model is flexible so that both the cohort and period factors can be treated as dummy or continuous variables, and the parameter estimations can be obtained for numerous combinations of variables as in a regression analysis. (3) The model is applicable even when the time period is unequally spaced.^ Two specific models are considered to illustrate the new approach and applied to the U.S. prostate cancer data. We find that there are significant differences between all cohorts and there is a significant period effect for both whites and nonwhites. The underlying hazard increases exponentially with age indicating that old people have much higher risk than young people. A log transformation of relative risk shows that the prostate cancer risk declined in recent cohorts for both models. However, prostate cancer risk declined 5 cohorts (25 years) earlier for whites than for nonwhites under the period factor model (0 0 0 1 1 1 1). These latter results are similar to the previous study by Holford (1983).^ The new approach offers a general method to analyze the age-period-cohort data without using any arbitrary constraint in the model. ^

Veja mais

Evaluation of the Hosmer -Lemeshow global goodness -of-fit statistic for mixed variable logistic regression models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^

Veja mais

Predicting smokeless tobacco cessation in a blue -collar population

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of smokeless tobacco products is undergoing an alarming resurgence in the United States. Several national surveys have reported a higher prevalence of use among those employed in blue-collar occupations. National objectives now target this group for health promotion programs which reduce the health risks associated with tobacco use.^ Drawn from a larger data set measuring health behaviors, this cross-sectional study tested the applicability of two related theories, the Theory of Reasoned Action (TRA) and the Theory of Planned Behavior (TPB), to smokeless tobacco (SLT) cessation in a blue-collar population of gas pipeline workers. In order to understand the determinants of SLT cessation, measures were obtained of demographic and normative characteristics of the population and specific constructs. Attitude toward the act of quitting (AACT) and subjective norm (SN) are constructs common to both models, perceived behavioral control (PBC) is unique to the TPB, and the number of past quit attempts is not contained in either model. In addition, a self-reported measure was taken of SLT use at two-month follow-up.^ The study population was comprised of all male SLT users who were field employees in a large gas pipeline company with gas compressor stations extending from Texas to the Canadian border. At baseline, 199 employees responded to the SLT portion of the survey, 118 completed some portion of the two-month follow-up, and 101 could be matched across time.^ As hypothesized, significant correlations were found between constructs antecedent to AACT and SN, although crossover effects occurred. Significant differences were found between SLT cessation intenders and non-intenders with regard to their personal and normative beliefs about quitting as well as their outcome expectancies and motivation to comply with others' beliefs. These differences occurred in the expected direction, with the mean intender score consistently higher than that of the non-intender.^ Contrary to hypothesis, AACT predicted intention to quit but SN did not. However, confirmatory of the TPB, PBC, operationalized as self-efficacy, independently contributed to the prediction of intention. Statistically significant relationships were not found between intention, perceived behavioral control, their interactive effects, and use behavior at two-month follow-up. The introduction of number of quit attempts into the logistic regression model resulted in insignificant findings for independent and interactive effects.^ The findings from this study are discussed in relation to their implications for program development and practice, especially within the worksite. In order to confirm and extend the findings of this investigation, recommendations for future research are also discussed. ^

Veja mais

14 resultados para model selection in binary regression

em DigitalCommons@The Texas Medical Center

Filtro por publicador