19 resultados para predictive regression model
em DigitalCommons@The Texas Medical Center
Resumo:
Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
The standard analyses of survival data involve the assumption that survival and censoring are independent. When censoring and survival are related, the phenomenon is known as informative censoring. This paper examines the effects of an informative censoring assumption on the hazard function and the estimated hazard ratio provided by the Cox model.^ The limiting factor in all analyses of informative censoring is the problem of non-identifiability. Non-identifiability implies that it is impossible to distinguish a situation in which censoring and death are independent from one in which there is dependence. However, it is possible that informative censoring occurs. Examination of the literature indicates how others have approached the problem and covers the relevant theoretical background.^ Three models are examined in detail. The first model uses conditionally independent marginal hazards to obtain the unconditional survival function and hazards. The second model is based on the Gumbel Type A method for combining independent marginal distributions into bivariate distributions using a dependency parameter. Finally, a formulation based on a compartmental model is presented and its results described. For the latter two approaches, the resulting hazard is used in the Cox model in a simulation study.^ The unconditional survival distribution formed from the first model involves dependency, but the crude hazard resulting from this unconditional distribution is identical to the marginal hazard, and inferences based on the hazard are valid. The hazard ratios formed from two distributions following the Gumbel Type A model are biased by a factor dependent on the amount of censoring in the two populations and the strength of the dependency of death and censoring in the two populations. The Cox model estimates this biased hazard ratio. In general, the hazard resulting from the compartmental model is not constant, even if the individual marginal hazards are constant, unless censoring is non-informative. The hazard ratio tends to a specific limit.^ Methods of evaluating situations in which informative censoring is present are described, and the relative utility of the three models examined is discussed. ^
Resumo:
The problem of analyzing data with updated measurements in the time-dependent proportional hazards model arises frequently in practice. One available option is to reduce the number of intervals (or updated measurements) to be included in the Cox regression model. We empirically investigated the bias of the estimator of the time-dependent covariate while varying the effect of failure rate, sample size, true values of the parameters and the number of intervals. We also evaluated how often a time-dependent covariate needs to be collected and assessed the effect of sample size and failure rate on the power of testing a time-dependent effect.^ A time-dependent proportional hazards model with two binary covariates was considered. The time axis was partitioned into k intervals. The baseline hazard was assumed to be 1 so that the failure times were exponentially distributed in the ith interval. A type II censoring model was adopted to characterize the failure rate. The factors of interest were sample size (500, 1000), type II censoring with failure rates of 0.05, 0.10, and 0.20, and three values for each of the non-time-dependent and time-dependent covariates (1/4,1/2,3/4).^ The mean of the bias of the estimator of the coefficient of the time-dependent covariate decreased as sample size and number of intervals increased whereas the mean of the bias increased as failure rate and true values of the covariates increased. The mean of the bias of the estimator of the coefficient was smallest when all of the updated measurements were used in the model compared with two models that used selected measurements of the time-dependent covariate. For the model that included all the measurements, the coverage rates of the estimator of the coefficient of the time-dependent covariate was in most cases 90% or more except when the failure rate was high (0.20). The power associated with testing a time-dependent effect was highest when all of the measurements of the time-dependent covariate were used. An example from the Systolic Hypertension in the Elderly Program Cooperative Research Group is presented. ^
Resumo:
A historical prospective study was designed to assess the man weight status of subjects who participated in a behavioral weight reduction program in 1983 and to determine whether there was an association between the dependent variable weight change and any of 31 independent variables after a 2 year follow-up period. Data was obtained by abstracting the subjects records and from a follow-up questionnaire administered 2 years following program participation. Five hundred nine subjects (386 females and 123 males) of 1460 subjects who participated in the program, completed and returned the questionnaire. Results showed that mean weight was significantly different (p < 0.001) between the measurement at baseline and after a 2 year follow-up period. The mean weight loss of the group was 5.8 pounds, 10.7 pounds for males and 4.2 pounds for females after a 2 year follow-up period. A total of 63.9% of the group, 69.9% of males and 61.9% of females were still below their initial weight after the 2 year follow-up period. Sixteen of the 31 variables assessed utilizing bivariate analyses were found to be significantly (p (LESSTHEQ) 0.05) associated with weight change after a 2 year follow-up period. These variables were then entered into a multivariate linear regression model. A total of 37.9% of the variance of the dependent variable, weight change, was accounted for by all 16 variables. Eight of these variables were found to be significantly (p (LESSTHEQ) 0.05) predictive of weight change in the stepwise multivariate process accounting for 37.1% of the variance. These variables included: Two baseline variables (percent over ideal body weight at enrollment and occupation) and six follow-up variables (feeling in control of eating habits, percent of body weight lost during treatment, frequency of weight measurement, physical activity, eating in response to emotions, and number of pounds of weight gain needed to resume a diet). It was concluded that a greater amount of emphasis should be placed on the six follow-up variables by clinicians involved in the treatment of obesity, and by the subjects themselves to enhance their chances of success at long-term weight loss. ^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
Recent studies using diffusion tensor imaging (DTI) have advanced our knowledge of the organization of white matter subserving language function. It remains unclear, however, how DTI may be used to predict accurately a key feature of language organization: its asymmetric representation in one cerebral hemisphere. In this study of epilepsy patients with unambiguous lateralization on Wada testing (19 left and 4 right lateralized subjects; no bilateral subjects), the predictive value of DTI for classifying the dominant hemisphere for language was assessed relative to the existing standard-the intra-carotid Amytal (Wada) procedure. Our specific hypothesis is that language laterality in both unilateral left- and right-hemisphere language dominant subjects may be predicted by hemispheric asymmetry in the relative density of three white matter pathways terminating in the temporal lobe implicated in different aspects of language function: the arcuate (AF), uncinate (UF), and inferior longitudinal fasciculi (ILF). Laterality indices computed from asymmetry of high anisotropy AF pathways, but not the other pathways, classified the majority (19 of 23) of patients using the Wada results as the standard. A logistic regression model incorporating information from DTI of the AF, fMRI activity in Broca's area, and handedness was able to classify 22 of 23 (95.6%) patients correctly according to their Wada score. We conclude that evaluation of highly anisotropic components of the AF alone has significant predictive power for determining language laterality, and that this markedly asymmetric distribution in the dominant hemisphere may reflect enhanced connectivity between frontal and temporal sites to support fluent language processes. Given the small sample reported in this preliminary study, future research should assess this method on a larger group of patients, including subjects with bi-hemispheric dominance.
Resumo:
This research aimed to explore the extent to which police use of force was related to attitudes towards violence, agency type, and racism. Previous studies have found a culture of honor in the psychology of violence in the Southern United States. Were similar attitudes measurable among Texas professional line officers? Are there predictors of use of force?^ A self reported anonymous survey was administered to Texas patrol officers in the cities of Austin and Houston, and the Counties of Harris and Travis. A total of seventy-four questionnaires were used in the statistical analyses. Scales were developed measuring use of force, attitudes towards violence, and feelings on racism. Their relationship was examined.^ A regression model shows a strong and significant relationship between the officers' attitudes towards violence and the self-reported use of force. Further, agency type, municipal versus sheriff, also predicts use of force. Attitudes regarding race or racism, as measured by this study, were not predictive of use of force. ^
Resumo:
Background. Cardiac tamponade can occur when a large amount of fluid, gas, singly or in combination, accumulating within the pericardium, compresses the heart causing circulatory compromise. Although previous investigators have found the 12-lead ECG to have a poor predictive value in diagnosing cardiac tamponade, very few studies have evaluated it as a follow up tool for ruling in or ruling out tamponade in patients with previously diagnosed malignant pericardial effusions. ^ Methods. 127 patients with malignant pericardial effusions at the MD Anderson Cancer Center were included in this retrospective study. While 83 of these patients had a cardiac tamponade diagnosed by echocardiographic criteria (Gold standard), 44 did not. We computed the sensitivity (Se), specificity (Sp), positive (PPV) and negative predictive values (NPV) for individual and combinations of ECG abnormalities. Individual ECG abnormalities were also entered singly into a univariate logistic regression model to predict tamponade. ^ Results. For patients with effusions of all sizes, electrical alternans had a Se, Sp, PPV and NPV of 22.61%, 97.61%, 95% and 39.25% respectively. These parameters for low voltage complexes were 55.95%, 74.44%, 81.03%, 46.37% respectively. The presence of all three ECG abnormalities had a Se = 8.33%, Sp = 100%, PPV = 100% and NPV = 35.83% while the presence of at least one of the three ECG abnormalities had a Se = 89.28%, Sp = 46.51%, PPV = 76.53%, NPV = 68.96%. For patients with effusions of all sizes electrical alternans had an OR of 12.28 (1.58–95.17, p = 0.016), while the presence of at least one ECG abnormality had an OR of 7.25 (2.9–18.1, p = 0.000) in predicting tamponade. ^ Conclusions. Although individual ECG abnormalities had low sensitivities, specificities, NPVs and PPVs with the exception of electrical alternans, the presence of at least one of the three ECG abnormalities had a high sensitivity in diagnosing cardiac tamponade. This could point to its potential use as a screening test with a correspondingly high NPV to rule out a diagnosis of tamponade in patients with malignant pericardial effusions. This could save expensive echocardiographic assessments in patients with previously diagnosed pericardial effusions. ^
Resumo:
Mean corpuscular volume, which is an inexpensive and widely available measure to assess, increases in HIV infected individuals receiving zidovudine and stavudine raising the hypothesis that it could be used as a surrogate for adherence.^ The aim of this study was to examine the association between mean corpuscular volume and adherence to antiretroviral therapy among HIV infected children and adolescents aged 0–19 years in Uganda as well as the extent to which changes in mean corpuscular volume predict adherence as determined by virologic suppression.^ The investigator retrospectively reviewed and analyzed secondary data of 158 HIV infected children and adolescents aged 0–19 years who initiated antiretroviral therapy under an observational cohort at the Baylor College of Medicine Children's Foundation - Uganda. Viral suppression was used as the gold standard for monitoring adherence and defined as viral load of < 400 copies/ml at 24 and 48 weeks. ^ Patients were at least 48 weeks on therapy, age 0.2–18.4 years, 54.4% female, 82.3% on zidovudine based regimen, 92% WHO stage III at initiation of therapy, median pre therapy MCV 80.6 fl (70.3–98.3 fl), median CD4% 10.2% (0.3%–28.0%), and mean pre therapy viral load 407,712.9 ± 270,413.9 copies/ml. For both 24 and 48 weeks of antiretroviral therapy, patients with viral suppression had a greater mean percentage change in mean corpuscular volume (15.1% ± 8.4 vs. 11.1% ± 7.8 and 2.3% ± 13.2 vs. -2.7% ± 10.5 respectively). The mean percentage change in mean corpuscular volume was greater in the first 24 weeks of therapy for patients with and without viral suppression (15.1% ± 8.4 vs. 2.3% ± 13.2 and 11.1% ± 7.8 vs. -2.7% ± 10.5 respectively). In the multivariate logistic regression model, percentage change in mean corpuscular volume ≥ 20% was significantly associated with viral suppression (adjusted OR 4.0; CI 1.2–13.3; p value 0.02). The ability of percentage changes in MCV to correctly identify children and adolescents with viral suppression was higher at a cut off of ≥ 20% (90.7%; sensitivity, 31.7%) than at ≥ 9% (82.9%; sensitivity, 78.9%). Negative predictive value was lower at ≥ 20% change (25%; specificity, 84.8%) than at ≥ 9% change (33.3%; specificity, 39.4%).^ Mean corpuscular volume is a useful marker of adherence among children and adolescents with viral suppression. ^
Resumo:
Trauma and severe head injuries are important issues because they are prevalent, because they occur predominantly in the young, and because variations in clinical management may matter. Trauma is the leading cause of death for those under age 40. The focus of this head injury study is to determine if variations in time from the scene of accident to a trauma center hospital makes a difference in patient outcomes.^ A trauma registry is maintained in the Houston-Galveston area and includes all patients admitted to any one of three trauma center hospitals with mild or severe head injuries. A study cohort, derived from the Registry, includes 254 severe head injury cases, for 1980, with a Glasgow Coma Score of 8 or less.^ Multiple influences relate to patient outcomes from severe head injury. Two primary variables and four confounding variables are identified, including time to emergency room, time to intubation, patient age, severity of injury, type of injury and mode of transport to the emergency room. Regression analysis, analysis of variance, and chi-square analysis were the principal statistical methods utilized.^ Analysis indicates that within an urban setting, with a four-hour time span, variations in time to emergency room do not provide any strong influence or predictive value to patient outcome. However, data are suggestive that at longer time periods there is a negative influence on outcomes. Age is influential only when the older group (55-64) is included. Mode of transport (helicopter or ambulance) did not indicate any significant difference in outcome.^ In a multivariate regression model, outcomes are influenced primarily by severity of injury and age which explain 36% (R('2)) of variance. Inclusion of time to emergency room, time to intubation, transport mode and type injury add only 4% (R('2)) additional contribution to explaining variation in patient outcome.^ The research concludes that since the group most at risk to head trauma is the young adult male involved in automobile/motorcycle accidents, more may be gained by modifying driving habits and other preventive measures. Continuous clinical and evaluative research are required to provide updated clinical wisdom in patient management and trauma treatment protocols. A National Institute of Trauma may be required to develop a national public policy and evaluate the many medical, behavioral and social changes required to cope with the country's number 3 killer and the primary killer of young adults.^
Resumo:
Traditional comparison of standardized mortality ratios (SMRs) can be misleading if the age-specific mortality ratios are not homogeneous. For this reason, a regression model has been developed which incorporates the mortality ratio as a function of age. This model is then applied to mortality data from an occupational cohort study. The nature of the occupational data necessitates the investigation of mortality ratios which increase with age. These occupational data are used primarily to illustrate and develop the statistical methodology.^ The age-specific mortality ratio (MR) for the covariates of interest can be written as MR(,ij...m) = ((mu)(,ij...m)/(theta)(,ij...m)) = r(.)exp (Z('')(,ij...m)(beta)) where (mu)(,ij...m) and (theta)(,ij...m) denote the force of mortality in the study and chosen standard populations in the ij...m('th) stratum, respectively, r is the intercept, Z(,ij...m) is the vector of covariables associated with the i('th) age interval, and (beta) is a vector of regression coefficients associated with these covariables. A Newton-Raphson iterative procedure has been used for determining the maximum likelihood estimates of the regression coefficients.^ This model provides a statistical method for a logical and easily interpretable explanation of an occupational cohort mortality experience. Since it gives a reasonable fit to the mortality data, it can also be concluded that the model is fairly realistic. The traditional statistical method for the analysis of occupational cohort mortality data is to present a summary index such as the SMR under the assumption of constant (homogeneous) age-specific mortality ratios. Since the mortality ratios for occupational groups usually increase with age, the homogeneity assumption of the age-specific mortality ratios is often untenable. The traditional method of comparing SMRs under the homogeneity assumption is a special case of this model, without age as a covariate.^ This model also provides a statistical technique to evaluate the relative risk between two SMRs or a dose-response relationship among several SMRs. The model presented has application in the medical, demographic and epidemiologic areas. The methods developed in this thesis are suitable for future analyses of mortality or morbidity data when the age-specific mortality/morbidity experience is a function of age or when there is an interaction effect between confounding variables needs to be evaluated. ^
Resumo:
In regression analysis, covariate measurement error occurs in many applications. The error-prone covariates are often referred to as latent variables. In this proposed study, we extended the study of Chan et al. (2008) on recovering latent slope in a simple regression model to that in a multiple regression model. We presented an approach that applied the Monte Carlo method in the Bayesian framework to the parametric regression model with the measurement error in an explanatory variable. The proposed estimator applied the conditional expectation of latent slope given the observed outcome and surrogate variables in the multiple regression models. A simulation study was presented showing that the method produces estimator that is efficient in the multiple regression model, especially when the measurement error variance of surrogate variable is large.^
Resumo:
The tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is an obvious carcinogen for lung cancer. Since CBMN (Cytokinesis-blocked micronucleus) has been found to be extremely sensitive to NNK-induced genetic damage, it is a potential important factor to predict the lung cancer risk. However, the association between lung cancer and NNK-induced genetic damage measured by CBMN assay has not been rigorously examined. ^ This research develops a methodology to model the chromosomal changes under NNK-induced genetic damage in a logistic regression framework in order to predict the occurrence of lung cancer. Since these chromosomal changes were usually not observed very long due to laboratory cost and time, a resampling technique was applied to generate the Markov chain of the normal and the damaged cell for each individual. A joint likelihood between the resampled Markov chains and the logistic regression model including transition probabilities of this chain as covariates was established. The Maximum likelihood estimation was applied to carry on the statistical test for comparison. The ability of this approach to increase discriminating power to predict lung cancer was compared to a baseline "non-genetic" model. ^ Our method offered an option to understand the association between the dynamic cell information and lung cancer. Our study indicated the extent of DNA damage/non-damage using the CBMN assay provides critical information that impacts public health studies of lung cancer risk. This novel statistical method could simultaneously estimate the process of DNA damage/non-damage and its relationship with lung cancer for each individual.^