61 resultados para Pooled-regression model
em DigitalCommons@The Texas Medical Center
Resumo:
Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
The standard analyses of survival data involve the assumption that survival and censoring are independent. When censoring and survival are related, the phenomenon is known as informative censoring. This paper examines the effects of an informative censoring assumption on the hazard function and the estimated hazard ratio provided by the Cox model.^ The limiting factor in all analyses of informative censoring is the problem of non-identifiability. Non-identifiability implies that it is impossible to distinguish a situation in which censoring and death are independent from one in which there is dependence. However, it is possible that informative censoring occurs. Examination of the literature indicates how others have approached the problem and covers the relevant theoretical background.^ Three models are examined in detail. The first model uses conditionally independent marginal hazards to obtain the unconditional survival function and hazards. The second model is based on the Gumbel Type A method for combining independent marginal distributions into bivariate distributions using a dependency parameter. Finally, a formulation based on a compartmental model is presented and its results described. For the latter two approaches, the resulting hazard is used in the Cox model in a simulation study.^ The unconditional survival distribution formed from the first model involves dependency, but the crude hazard resulting from this unconditional distribution is identical to the marginal hazard, and inferences based on the hazard are valid. The hazard ratios formed from two distributions following the Gumbel Type A model are biased by a factor dependent on the amount of censoring in the two populations and the strength of the dependency of death and censoring in the two populations. The Cox model estimates this biased hazard ratio. In general, the hazard resulting from the compartmental model is not constant, even if the individual marginal hazards are constant, unless censoring is non-informative. The hazard ratio tends to a specific limit.^ Methods of evaluating situations in which informative censoring is present are described, and the relative utility of the three models examined is discussed. ^
Resumo:
The problem of analyzing data with updated measurements in the time-dependent proportional hazards model arises frequently in practice. One available option is to reduce the number of intervals (or updated measurements) to be included in the Cox regression model. We empirically investigated the bias of the estimator of the time-dependent covariate while varying the effect of failure rate, sample size, true values of the parameters and the number of intervals. We also evaluated how often a time-dependent covariate needs to be collected and assessed the effect of sample size and failure rate on the power of testing a time-dependent effect.^ A time-dependent proportional hazards model with two binary covariates was considered. The time axis was partitioned into k intervals. The baseline hazard was assumed to be 1 so that the failure times were exponentially distributed in the ith interval. A type II censoring model was adopted to characterize the failure rate. The factors of interest were sample size (500, 1000), type II censoring with failure rates of 0.05, 0.10, and 0.20, and three values for each of the non-time-dependent and time-dependent covariates (1/4,1/2,3/4).^ The mean of the bias of the estimator of the coefficient of the time-dependent covariate decreased as sample size and number of intervals increased whereas the mean of the bias increased as failure rate and true values of the covariates increased. The mean of the bias of the estimator of the coefficient was smallest when all of the updated measurements were used in the model compared with two models that used selected measurements of the time-dependent covariate. For the model that included all the measurements, the coverage rates of the estimator of the coefficient of the time-dependent covariate was in most cases 90% or more except when the failure rate was high (0.20). The power associated with testing a time-dependent effect was highest when all of the measurements of the time-dependent covariate were used. An example from the Systolic Hypertension in the Elderly Program Cooperative Research Group is presented. ^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
Traditional comparison of standardized mortality ratios (SMRs) can be misleading if the age-specific mortality ratios are not homogeneous. For this reason, a regression model has been developed which incorporates the mortality ratio as a function of age. This model is then applied to mortality data from an occupational cohort study. The nature of the occupational data necessitates the investigation of mortality ratios which increase with age. These occupational data are used primarily to illustrate and develop the statistical methodology.^ The age-specific mortality ratio (MR) for the covariates of interest can be written as MR(,ij...m) = ((mu)(,ij...m)/(theta)(,ij...m)) = r(.)exp (Z('')(,ij...m)(beta)) where (mu)(,ij...m) and (theta)(,ij...m) denote the force of mortality in the study and chosen standard populations in the ij...m('th) stratum, respectively, r is the intercept, Z(,ij...m) is the vector of covariables associated with the i('th) age interval, and (beta) is a vector of regression coefficients associated with these covariables. A Newton-Raphson iterative procedure has been used for determining the maximum likelihood estimates of the regression coefficients.^ This model provides a statistical method for a logical and easily interpretable explanation of an occupational cohort mortality experience. Since it gives a reasonable fit to the mortality data, it can also be concluded that the model is fairly realistic. The traditional statistical method for the analysis of occupational cohort mortality data is to present a summary index such as the SMR under the assumption of constant (homogeneous) age-specific mortality ratios. Since the mortality ratios for occupational groups usually increase with age, the homogeneity assumption of the age-specific mortality ratios is often untenable. The traditional method of comparing SMRs under the homogeneity assumption is a special case of this model, without age as a covariate.^ This model also provides a statistical technique to evaluate the relative risk between two SMRs or a dose-response relationship among several SMRs. The model presented has application in the medical, demographic and epidemiologic areas. The methods developed in this thesis are suitable for future analyses of mortality or morbidity data when the age-specific mortality/morbidity experience is a function of age or when there is an interaction effect between confounding variables needs to be evaluated. ^
Resumo:
In regression analysis, covariate measurement error occurs in many applications. The error-prone covariates are often referred to as latent variables. In this proposed study, we extended the study of Chan et al. (2008) on recovering latent slope in a simple regression model to that in a multiple regression model. We presented an approach that applied the Monte Carlo method in the Bayesian framework to the parametric regression model with the measurement error in an explanatory variable. The proposed estimator applied the conditional expectation of latent slope given the observed outcome and surrogate variables in the multiple regression models. A simulation study was presented showing that the method produces estimator that is efficient in the multiple regression model, especially when the measurement error variance of surrogate variable is large.^
Resumo:
The tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is an obvious carcinogen for lung cancer. Since CBMN (Cytokinesis-blocked micronucleus) has been found to be extremely sensitive to NNK-induced genetic damage, it is a potential important factor to predict the lung cancer risk. However, the association between lung cancer and NNK-induced genetic damage measured by CBMN assay has not been rigorously examined. ^ This research develops a methodology to model the chromosomal changes under NNK-induced genetic damage in a logistic regression framework in order to predict the occurrence of lung cancer. Since these chromosomal changes were usually not observed very long due to laboratory cost and time, a resampling technique was applied to generate the Markov chain of the normal and the damaged cell for each individual. A joint likelihood between the resampled Markov chains and the logistic regression model including transition probabilities of this chain as covariates was established. The Maximum likelihood estimation was applied to carry on the statistical test for comparison. The ability of this approach to increase discriminating power to predict lung cancer was compared to a baseline "non-genetic" model. ^ Our method offered an option to understand the association between the dynamic cell information and lung cancer. Our study indicated the extent of DNA damage/non-damage using the CBMN assay provides critical information that impacts public health studies of lung cancer risk. This novel statistical method could simultaneously estimate the process of DNA damage/non-damage and its relationship with lung cancer for each individual.^
Resumo:
It is well known that an identification problem exists in the analysis of age-period-cohort data because of the relationship among the three factors (date of birth + age at death = date of death). There are numerous suggestions about how to analyze the data. No one solution has been satisfactory. The purpose of this study is to provide another analytic method by extending the Cox's lifetable regression model with time-dependent covariates. The new approach contains the following features: (1) It is based on the conditional maximum likelihood procedure using a proportional hazard function described by Cox (1972), treating the age factor as the underlying hazard to estimate the parameters for the cohort and period factors. (2) The model is flexible so that both the cohort and period factors can be treated as dummy or continuous variables, and the parameter estimations can be obtained for numerous combinations of variables as in a regression analysis. (3) The model is applicable even when the time period is unequally spaced.^ Two specific models are considered to illustrate the new approach and applied to the U.S. prostate cancer data. We find that there are significant differences between all cohorts and there is a significant period effect for both whites and nonwhites. The underlying hazard increases exponentially with age indicating that old people have much higher risk than young people. A log transformation of relative risk shows that the prostate cancer risk declined in recent cohorts for both models. However, prostate cancer risk declined 5 cohorts (25 years) earlier for whites than for nonwhites under the period factor model (0 0 0 1 1 1 1). These latter results are similar to the previous study by Holford (1983).^ The new approach offers a general method to analyze the age-period-cohort data without using any arbitrary constraint in the model. ^
Resumo:
BACKGROUND: Variants in the complement cascade genes and the LOC387715/HTRA1, have been widely reported to associate with age-related macular degeneration (AMD), the most common cause of visual impairment in industrialized countries. METHODS/PRINCIPAL FINDINGS: We investigated the association between the LOC387715 A69S and complement component C3 R102G risk alleles in the Finnish case-control material and found a significant association with both variants (OR 2.98, p = 3.75 x 10(-9); non-AMD controls and OR 2.79, p = 2.78 x 10(-19), blood donor controls and OR 1.83, p = 0.008; non-AMD controls and OR 1.39, p = 0.039; blood donor controls), respectively. Previously, we have shown a strong association between complement factor H (CFH) Y402H and AMD in the Finnish population. A carrier of at least one risk allele in each of the three susceptibility loci (LOC387715, C3, CFH) had an 18-fold risk of AMD when compared to a non-carrier homozygote in all three loci. A tentative gene-gene interaction between the two major AMD-associated loci, LOC387715 and CFH, was found in this study using a multiplicative (logistic regression) model, a synergy index (departure-from-additivity model) and the mutual information method (MI), suggesting that a common causative pathway may exist for these genes. Smoking (ever vs. never) exerted an extra risk for AMD, but somewhat surprisingly, only in connection with other factors such as sex and the C3 genotype. Population attributable risks (PAR) for the CFH, LOC387715 and C3 variants were 58.2%, 51.4% and 5.8%, respectively, the summary PAR for the three variants being 65.4%. CONCLUSIONS/SIGNIFICANCE: Evidence for gene-gene interaction between two major AMD associated loci CFH and LOC387715 was obtained using three methods, logistic regression, a synergy index and the mutual information (MI) index.
Resumo:
BACKGROUND: Renal involvement is a serious manifestation of systemic lupus erythematosus (SLE); it may portend a poor prognosis as it may lead to end-stage renal disease (ESRD). The purpose of this study was to determine the factors predicting the development of renal involvement and its progression to ESRD in a multi-ethnic SLE cohort (PROFILE). METHODS AND FINDINGS: PROFILE includes SLE patients from five different United States institutions. We examined at baseline the socioeconomic-demographic, clinical, and genetic variables associated with the development of renal involvement and its progression to ESRD by univariable and multivariable Cox proportional hazards regression analyses. Analyses of onset of renal involvement included only patients with renal involvement after SLE diagnosis (n = 229). Analyses of ESRD included all patients, regardless of whether renal involvement occurred before, at, or after SLE diagnosis (34 of 438 patients). In addition, we performed a multivariable logistic regression analysis of the variables associated with the development of renal involvement at any time during the course of SLE.In the time-dependent multivariable analysis, patients developing renal involvement were more likely to have more American College of Rheumatology criteria for SLE, and to be younger, hypertensive, and of African-American or Hispanic (from Texas) ethnicity. Alternative regression models were consistent with these results. In addition to greater accrued disease damage (renal damage excluded), younger age, and Hispanic ethnicity (from Texas), homozygosity for the valine allele of FcgammaRIIIa (FCGR3A*GG) was a significant predictor of ESRD. Results from the multivariable logistic regression model that included all cases of renal involvement were consistent with those from the Cox model. CONCLUSIONS: Fcgamma receptor genotype is a risk factor for progression of renal disease to ESRD. Since the frequency distribution of FCGR3A alleles does not vary significantly among the ethnic groups studied, the additional factors underlying the ethnic disparities in renal disease progression remain to be elucidated.
Resumo:
Recent studies using diffusion tensor imaging (DTI) have advanced our knowledge of the organization of white matter subserving language function. It remains unclear, however, how DTI may be used to predict accurately a key feature of language organization: its asymmetric representation in one cerebral hemisphere. In this study of epilepsy patients with unambiguous lateralization on Wada testing (19 left and 4 right lateralized subjects; no bilateral subjects), the predictive value of DTI for classifying the dominant hemisphere for language was assessed relative to the existing standard-the intra-carotid Amytal (Wada) procedure. Our specific hypothesis is that language laterality in both unilateral left- and right-hemisphere language dominant subjects may be predicted by hemispheric asymmetry in the relative density of three white matter pathways terminating in the temporal lobe implicated in different aspects of language function: the arcuate (AF), uncinate (UF), and inferior longitudinal fasciculi (ILF). Laterality indices computed from asymmetry of high anisotropy AF pathways, but not the other pathways, classified the majority (19 of 23) of patients using the Wada results as the standard. A logistic regression model incorporating information from DTI of the AF, fMRI activity in Broca's area, and handedness was able to classify 22 of 23 (95.6%) patients correctly according to their Wada score. We conclude that evaluation of highly anisotropic components of the AF alone has significant predictive power for determining language laterality, and that this markedly asymmetric distribution in the dominant hemisphere may reflect enhanced connectivity between frontal and temporal sites to support fluent language processes. Given the small sample reported in this preliminary study, future research should assess this method on a larger group of patients, including subjects with bi-hemispheric dominance.
Resumo:
This dissertation explores phase I dose-finding designs in cancer trials from three perspectives: the alternative Bayesian dose-escalation rules, a design based on a time-to-dose-limiting toxicity (DLT) model, and a design based on a discrete-time multi-state (DTMS) model. We list alternative Bayesian dose-escalation rules and perform a simulation study for the intra-rule and inter-rule comparisons based on two statistical models to identify the most appropriate rule under certain scenarios. We provide evidence that all the Bayesian rules outperform the traditional ``3+3'' design in the allocation of patients and selection of the maximum tolerated dose. The design based on a time-to-DLT model uses patients' DLT information over multiple treatment cycles in estimating the probability of DLT at the end of treatment cycle 1. Dose-escalation decisions are made whenever a cycle-1 DLT occurs, or two months after the previous check point. Compared to the design based on a logistic regression model, the new design shows more safety benefits for trials in which more late-onset toxicities are expected. As a trade-off, the new design requires more patients on average. The design based on a discrete-time multi-state (DTMS) model has three important attributes: (1) Toxicities are categorized over a distribution of severity levels, (2) Early toxicity may inform dose escalation, and (3) No suspension is required between accrual cohorts. The proposed model accounts for the difference in the importance of the toxicity severity levels and for transitions between toxicity levels. We compare the operating characteristics of the proposed design with those from a similar design based on a fully-evaluated model that directly models the maximum observed toxicity level within the patients' entire assessment window. We describe settings in which, under comparable power, the proposed design shortens the trial. The proposed design offers more benefit compared to the alternative design as patient accrual becomes slower.
Resumo:
Background: High grade serous carcinoma whether ovarian, tubal or primary peritoneal, continues to be the most lethal gynecologic malignancy in the USA. Although combination chemotherapy and aggressive surgical resection has improved survival in the past decade the majority of patients still succumb to chemo-resistant disease recurrence. It has recently been reported that amplification of 5q31-5q35.3 is associated with poor prognosis in patients with high grade serous ovarian carcinoma. Although the amplicon contains over 50 genes, it is notable for the presence of several members of the fibroblast growth factor signaling axis. In particular acidic fibroblast growth factor (FGF1) has been demonstrated to be one of the driving genes in mediating the observed prognostic effect of the amplicon in ovarian cancer patients. This study seeks to further validate the prognostic value of fibroblast growth receptor 4 (FGFR4), another candidate gene of the FGF/FGFR axis located in the same amplicon. The emphasis will be delineating the role the FGF1/FGFR4 signaling axis plays in high grade serous ovarian carcinoma; and test the feasibility of targeting the FGF1/FGFR4 axis therapeutically. Materials and Methods: Spearman and Pearson correlation studies on data generated from array CGH and transcriptome profiling analyses on 51 microdissected tumor samples were used to identify genes located on chromosome 5q31-35.3 that showed significant correlation between DNA and mRNA copy numbers. Significant correlation between FGF1 and FGFR4 DNA copy numbers was further validated by qPCR analysis on DNA isolated from 51 microdissected tumor samples. Immunolocalization and quantification of FGFR4 expression were performed on paraffin embedded tissue samples from 183 cases of high-grade serous ovarian carcinoma. The expression was then correlated with clinical data to assess impact on survival. The expression of FGF1 and FGFR4 in vitro was quantified by real-time PCR and western blotting in six high-grade serous ovarian carcinoma cell lines and compared to those in human ovarian surface epithelial cells to identify overexpression. The effect of FGF1 on these cell lines after serum starvation was quantified for in vitro cellular proliferation, migration/invasion, chemoresistance and survival utilizing a combination of commercially available colorimetric, fluorometric and electrical impedance assays. FGFR4 expression was then transiently silenced via siRNA transfection and the effects on response to FGF1, cellular proliferation, and migration were quantified. To identify relevant cellular pathways involved, responsive cell lines were transduced with different transcription response elements using the Cignal-Lenti reporter system and treated with FGF1 with and without transient FGFR4 knock down. This was followed by western blot confirmation for the relevant phosphoproteins. Anti-FGF1 antibodies and FGFR trap proteins were used to attempt inhibition of FGF mediated phenotypic changes and relevant signaling in vitro. Orthotopic intraperitoneal tumors were established in nude mice using serous cell lines that have been previously transfected with luciferase expressing constructs. The mice were then treated with FGFR trap protein. Tumor progression was then followed via bioluminescent imaging. The FGFR4 gene from 52 clinical samples was sequenced to screen for mutations. Results: FGFR4 DNA and mRNA copy numbers were significantly correlated and FGFR4 DNA copy number was significantly correlated with that of FGF1. Survival of patients with high FGFR4 expressing tumors was significantly shorter that those with low expression(median survival 28 vs 55 month p< 0.001) In a multivariate cox regression model FGFR expression significantly increased risk of death (HR 2.1, p<0.001). FGFR4 expression was significantly higher in all cell lines tested compared to HOSE, OVCA432 cell line in particular had very high expression suggesting amplification. FGF1 was also particularly overexpressed in OVCA432. FGF1 significantly increased cell survival after serum deprivation in all cell lines. Transient knock down of FGFR4 caused significant reduction in cell migration and proliferation in vitro and significantly decreased the proliferative effects of FGF1 in vitro. FGFR1, FGFR4 traps and anti-FGF1 antibodies did not show activity in vitro. OVCA432 transfected with the cignal lenti reporter system revealed significant activation of MAPK, NFkB and WNT pathways, western blotting confirmed the results. Reverse phase protein array (RPPA) analysis also showed activation of MAPK, AKT, WNT pathways and down regulation of E Cadherin. FGFR trap protein significantly reduced tumor growth in vivo in an orthotopic mouse model. Conclusions: Overexpression and amplification of several members of the FGF signaling axis present on the amplicon 5q31-35.3 is a negative prognostic indicator in high grade serous ovarian carcinoma and may drive poor survival associated with that amplicon. Activation of The FGF signaling pathway leads to downstream activation of MAPK, AKT, WNT and NFkB pathways leading to a more aggressive cancer phenotype with increased tumor growth, evasion of apoptosis and increased migration and invasion. Inhibition of FGF pathway in vivo via FGFR trap protein leads to significantly decreased tumor growth in an orthotopic mouse model.