8 resultados para Model validation
em DigitalCommons@The Texas Medical Center
Resumo:
Despite many researches on development in education and psychology, not often is the methodology tested with real data. A major barrier to test the growth model is that the design of study includes repeated observations and the nature of the growth is nonlinear. The repeat measurements on a nonlinear model require sophisticated statistical methods. In this study, we present mixed effects model in a negative exponential curve to describe the development of children's reading skills. This model can describe the nature of the growth on children's reading skills and account for intra-individual and inter-individual variation. We also apply simple techniques including cross-validation, regression, and graphical methods to determine the most appropriate curve for data, to find efficient initial values of parameters, and to select potential covariates. We illustrate with an example that motivated this research: a longitudinal study of academic skills from grade 1 to grade 12 in Connecticut public schools. ^
Resumo:
Background/significance. The scarcity of reliable and valid Spanish language instruments for health related research has hindered research with the Hispanic population. Research suggests that fatalistic attitudes are related to poor cancer screening behaviors and may be one reason for low participation of Mexican-Americans in cancer screening. This problem is of major concern because Mexican-Americans constitute the largest Hispanic subgroup in the U.S.^ Purpose. The purposes of this study were: (1) To translate the Powe Fatalism Inventory, (PFI) into Spanish, and culturally adapt the instrument to the Mexican-American culture as found along the U.S.-Mexico border and (2) To test the equivalence between the Spanish translated, culturally adapted version of the PFI and the English version of the PFI to include clarity, content validity, reading level and reliability.^ Design. Descriptive, cross-sectional.^ Methods. The Spanish language translation used a translation model which incorporates a cultural adaptation process. The SPFI was administered to 175 bilingual participants residing in a midsize, U.S-Mexico border city. Data analysis included estimation of Cronbach's alpha, factor analysis, paired samples t-test comparison and multiple regression analysis using SPSS software, as well as measurement of content validity and reading level of the SPFI. ^ Findings. A reliability estimate using Cronbach's alpha coefficient was 0.81 for the SPFI compared to 0.80 for the PFI in this study. Factor Analysis extracted four factors which explained 59% of the variance. Paired t-test comparison revealed no statistically significant differences between the SPFI and PFI total or individual item scores. Content Validity Index was determined to be 1.0. Reading Level was assessed to be less than a 6th grade reading level. The correlation coefficient between the SPFI and PFI was 0.95.^ Conclusions. This study provided strong psychometric evidence that the Spanish translated, culturally adapted SPFI is an equivalent tool to the English version of the PFI in measuring cancer fatalism. This indicates that the two forms of the instrument can be used interchangeably in a single study to accommodate reading and speaking abilities of respondents. ^
High-resolution microarray analysis of chromosome 20q in human colon cancer metastasis model systems
Resumo:
Amplification of human chromosome 20q DNA is the most frequently occurring chromosomal abnormality detected in sporadic colorectal carcinomas and shows significant correlation with liver metastases. Through comprehensive high-resolution microarray comparative genomic hybridization and microarray gene expression profiling, we have characterized chromosome 20q amplicon genes associated with human colorectal cancer metastasis in two in vitro metastasis model systems. The results revealed increasing complexity of the 20q genomic profile from the primary tumor-derived cell lines to the lymph node and liver metastasis derived cell lines. Expression analysis of chromosome 20q revealed a subset of over expressed genes residing within the regions of genomic copy number gain in all the tumor cell lines, suggesting these are Chromosome 20q copy number responsive genes. Bases on their preferential expression levels in the model system cell lines and known biological function, four of the over expressed genes mapping to the common intervals of genomic copy gain were considered the most promising candidate colorectal metastasis-associated genes. Validation of genomic copy number and expression array data was carried out on these genes, with one gene, DNMT3B, standing out as expressed at a relatively higher levels in the metastasis-derived cell lines compared with their primary-derived counterparts in both the models systems analyzed. The data provide evidence for the role of chromosome 20q genes with low copy gain and elevated expression in the clonal evolution of metastatic cells and suggests that such genes may serve as early biomarkers of metastatic potential. The data also support the utility of the combined microarray comparative genomic hybridization and expression array analysis for identifying copy number responsive genes in areas of low DNA copy gain in cancer cells. ^
Resumo:
This study aimed to develop and validate The Cancer Family Impact Scale (CFIS), an instrument for use in studies investigating relationships among family factors and colorectal cancer (CRC) screening when family history is a risk factor. We used existing data to develop the measure from 1,285 participants (637 families) across the United States who were in the Johns Hopkins Colon Cancer Genetic Testing study. Participants were 94% white with an average age of 50.1 years, and 60% were women. None had a personal CRC history, and eighty percent had 1 FDR with CRC and 20% had more than one FDR with CRC. The study had three aims: (1) to identify the latent factors underlying the CFIS via exploratory factor analysis (EFA); (2) to confirm the findings of the EFA via confirmatory factor analysis (CFA); and (3) to assess the reliability of the scale via Cronbach's alpha. Exploratory analyses were performed on a split half of the sample, and the final model was confirmed on the other half. The EFA suggested the CFIS was an 18-item measure with 5 latent constructs: (1) NEGATIVE: negative effects of cancer on the family; (2) POSITIVE: positive effects of cancer on the family; (3) COMMUNICATE: how families communicate about cancer; (4) FLOW: how information about cancer is conveyed in families; and (5) NORM: how individuals react to family norms about cancer. CFA on the holdout sample showed the CFIS to have a reasonably good fit (Chi-square = 389.977, df = 122, RMSEA= 0.058 (.052-.065), CFI=.902, TLI=.877, GF1=.939). The overall reliability of the scale was α=0.65. The reliability of the subscales was: (1) NEGATIVE α = 0.682; (2) POSITIVE α = 0.686; (3) COMMUNICATE α = 0.723; (4) FLOW α = 0.467; and (5) NORM α = 0.732. ^ We concluded the CFIS to be a good measure with most fit levels over 0.90. The CFIS could be used to compare theoretically driven hypotheses about the pathways through which family factors could influence health behavior among unaffected individuals at risk due to family history, and also aid in the development and evaluation of cancer prevention interventions including a family component. ^
Resumo:
Interaction effect is an important scientific interest for many areas of research. Common approach for investigating the interaction effect of two continuous covariates on a response variable is through a cross-product term in multiple linear regression. In epidemiological studies, the two-way analysis of variance (ANOVA) type of method has also been utilized to examine the interaction effect by replacing the continuous covariates with their discretized levels. However, the implications of model assumptions of either approach have not been examined and the statistical validation has only focused on the general method, not specifically for the interaction effect.^ In this dissertation, we investigated the validity of both approaches based on the mathematical assumptions for non-skewed data. We showed that linear regression may not be an appropriate model when the interaction effect exists because it implies a highly skewed distribution for the response variable. We also showed that the normality and constant variance assumptions required by ANOVA are not satisfied in the model where the continuous covariates are replaced with their discretized levels. Therefore, naïve application of ANOVA method may lead to an incorrect conclusion. ^ Given the problems identified above, we proposed a novel method modifying from the traditional ANOVA approach to rigorously evaluate the interaction effect. The analytical expression of the interaction effect was derived based on the conditional distribution of the response variable given the discretized continuous covariates. A testing procedure that combines the p-values from each level of the discretized covariates was developed to test the overall significance of the interaction effect. According to the simulation study, the proposed method is more powerful then the least squares regression and the ANOVA method in detecting the interaction effect when data comes from a trivariate normal distribution. The proposed method was applied to a dataset from the National Institute of Neurological Disorders and Stroke (NINDS) tissue plasminogen activator (t-PA) stroke trial, and baseline age-by-weight interaction effect was found significant in predicting the change from baseline in NIHSS at Month-3 among patients received t-PA therapy.^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
Breast cancer is the most common non-skin cancer and the second leading cause of cancer-related death in women in the United States. Studies on ipsilateral breast tumor relapse (IBTR) status and disease-specific survival will help guide clinic treatment and predict patient prognosis.^ After breast conservation therapy, patients with breast cancer may experience breast tumor relapse. This relapse is classified into two distinct types: true local recurrence (TR) and new ipsilateral primary tumor (NP). However, the methods used to classify the relapse types are imperfect and are prone to misclassification. In addition, some observed survival data (e.g., time to relapse and time from relapse to death)are strongly correlated with relapse types. The first part of this dissertation presents a Bayesian approach to (1) modeling the potentially misclassified relapse status and the correlated survival information, (2) estimating the sensitivity and specificity of the diagnostic methods, and (3) quantify the covariate effects on event probabilities. A shared frailty was used to account for the within-subject correlation between survival times. The inference was conducted using a Bayesian framework via Markov Chain Monte Carlo simulation implemented in softwareWinBUGS. Simulation was used to validate the Bayesian method and assess its frequentist properties. The new model has two important innovations: (1) it utilizes the additional survival times correlated with the relapse status to improve the parameter estimation, and (2) it provides tools to address the correlation between the two diagnostic methods conditional to the true relapse types.^ Prediction of patients at highest risk for IBTR after local excision of ductal carcinoma in situ (DCIS) remains a clinical concern. The goals of the second part of this dissertation were to evaluate a published nomogram from Memorial Sloan-Kettering Cancer Center, to determine the risk of IBTR in patients with DCIS treated with local excision, and to determine whether there is a subset of patients at low risk of IBTR. Patients who had undergone local excision from 1990 through 2007 at MD Anderson Cancer Center with a final diagnosis of DCIS (n=794) were included in this part. Clinicopathologic factors and the performance of the Memorial Sloan-Kettering Cancer Center nomogram for prediction of IBTR were assessed for 734 patients with complete data. Nomogram for prediction of 5- and 10-year IBTR probabilities were found to demonstrate imperfect calibration and discrimination, with an area under the receiver operating characteristic curve of .63 and a concordance index of .63. In conclusion, predictive models for IBTR in DCIS patients treated with local excision are imperfect. Our current ability to accurately predict recurrence based on clinical parameters is limited.^ The American Joint Committee on Cancer (AJCC) staging of breast cancer is widely used to determine prognosis, yet survival within each AJCC stage shows wide variation and remains unpredictable. For the third part of this dissertation, biologic markers were hypothesized to be responsible for some of this variation, and the addition of biologic markers to current AJCC staging were examined for possibly provide improved prognostication. The initial cohort included patients treated with surgery as first intervention at MDACC from 1997 to 2006. Cox proportional hazards models were used to create prognostic scoring systems. AJCC pathologic staging parameters and biologic tumor markers were investigated to devise the scoring systems. Surveillance Epidemiology and End Results (SEER) data was used as the external cohort to validate the scoring systems. Binary indicators for pathologic stage (PS), estrogen receptor status (E), and tumor grade (G) were summed to create PS+EG scoring systems devised to predict 5-year patient outcomes. These scoring systems facilitated separation of the study population into more refined subgroups than the current AJCC staging system. The ability of the PS+EG score to stratify outcomes was confirmed in both internal and external validation cohorts. The current study proposes and validates a new staging system by incorporating tumor grade and ER status into current AJCC staging. We recommend that biologic markers be incorporating into revised versions of the AJCC staging system for patients receiving surgery as the first intervention.^ Chapter 1 focuses on developing a Bayesian method to solve misclassified relapse status and application to breast cancer data. Chapter 2 focuses on evaluation of a breast cancer nomogram for predicting risk of IBTR in patients with DCIS after local excision gives the statement of the problem in the clinical research. Chapter 3 focuses on validation of a novel staging system for disease-specific survival in patients with breast cancer treated with surgery as the first intervention. ^
Resumo:
The main objective of this study was to determine the external validity of a clinical prediction rule developed by the European Multicenter Study on Human Spinal Cord Injury (EM-SCI) to predict the ambulation outcomes 12 months after traumatic spinal cord injury. Data from the North American Clinical Trials Network (NACTN) data registry with approximately 500 SCI cases were used for this validity study. The predictive accuracy of the EM-SCI prognostic model was evaluated using calibration and discrimination based on 231 NACTN cases. The area under the receiver-operating-characteristics curve (ROC) curve was 0.927 (95% CI 0.894 – 0.959) for the EM-SCI model when applied to NACTN population. This is lower than the AUC of 0.956 (95% CI 0.936 – 0.976) reported for the EM-SCI population, but suggests that the EM-SCI clinical prediction rule distinguished well between those patients in the NACTN population who were able to achieve independent ambulation and those who did not achieve independent ambulation. The calibration curve suggests that higher the prediction score is, the better the probability of walking with the best prediction for AIS D patients. In conclusion, the EM-SCI clinical prediction rule was determined to be generalizable to the adult NACTN SCI population.^