999 resultados para Biology, Biostatistics|Statistics
Resumo:
Cardiovascular disease (CVD) is a threat to public health. It has been reported to be the leading cause of death in United States. The invention of next generation sequencing (NGS) technology has revolutionized the biomedical research. To investigate NGS data of CVD related quantitative traits would contribute to address the unknown etiology and disease mechanism of CVD. NHLBI's Exome Sequencing Project (ESP) contains CVD related phenotypes and their associated NGS exomes sequence data. Initially, a subset of next generation sequencing data consisting of 13 CVD-related quantitative traits was investigated. Only 6 traits, systolic blood pressure (SBP), diastolic blood pressure (DBP), height, platelet counts, waist circumference, and weight, were analyzed by functional linear model (FLM) and 7 currently existing methods. FLM outperformed all currently existing methods by identifying the highest number of significant genes and had identified 96, 139, 756, 1162, 1106, and 298 genes associated with SBP, DBP, Height, Platelet, Waist, and Weight respectively. ^
New methods for quantification and analysis of quantitative real-time polymerase chain reaction data
Resumo:
Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^
Resumo:
The genomic era brought by recent advances in the next-generation sequencing technology makes the genome-wide scans of natural selection a reality. Currently, almost all the statistical tests and analytical methods for identifying genes under selection was performed on the individual gene basis. Although these methods have the power of identifying gene subject to strong selection, they have limited power in discovering genes targeted by moderate or weak selection forces, which are crucial for understanding the molecular mechanisms of complex phenotypes and diseases. Recent availability and rapid completeness of many gene network and protein-protein interaction databases accompanying the genomic era open the avenues of exploring the possibility of enhancing the power of discovering genes under natural selection. The aim of the thesis is to explore and develop normal mixture model based methods for leveraging gene network information to enhance the power of natural selection target gene discovery. The results show that the developed statistical method, which combines the posterior log odds of the standard normal mixture model and the Guilt-By-Association score of the gene network in a naïve Bayes framework, has the power to discover moderate/weak selection gene which bridges the genes under strong selection and it helps our understanding the biology under complex diseases and related natural selection phenotypes.^
Resumo:
With most clinical trials, missing data presents a statistical problem in evaluating a treatment's efficacy. There are many methods commonly used to assess missing data; however, these methods leave room for bias to enter the study. This thesis was a secondary analysis on data taken from TIME, a phase 2 randomized clinical trial conducted to evaluate the safety and effect of the administration timing of bone marrow mononuclear cells (BMMNC) for subjects with acute myocardial infarction (AMI).^ We evaluated the effect of missing data by comparing the variance inflation factor (VIF) of the effect of therapy between all subjects and only subjects with complete data. Through the general linear model, an unbiased solution was made for the VIF of the treatment's efficacy using the weighted least squares method to incorporate missing data. Two groups were identified from the TIME data: 1) all subjects and 2) subjects with complete data (baseline and follow-up measurements). After the general solution was found for the VIF, it was migrated Excel 2010 to evaluate data from TIME. The resulting numerical value from the two groups was compared to assess the effect of missing data.^ The VIF values from the TIME study were considerably less in the group with missing data. By design, we varied the correlation factor in order to evaluate the VIFs of both groups. As the correlation factor increased, the VIF values increased at a faster rate in the group with only complete data. Furthermore, while varying the correlation factor, the number of subjects with missing data was also varied to see how missing data affects the VIF. When subjects with only baseline data was increased, we saw a significant rate increase in VIF values in the group with only complete data while the group with missing data saw a steady and consistent increase in the VIF. The same was seen when we varied the group with follow-up only data. This essentially showed that the VIFs steadily increased when missing data is not ignored. When missing data is ignored as with our comparison group, the VIF values sharply increase as correlation increases.^
Resumo:
Maximizing data quality may be especially difficult in trauma-related clinical research. Strategies are needed to improve data quality and assess the impact of data quality on clinical predictive models. This study had two objectives. The first was to compare missing data between two multi-center trauma transfusion studies: a retrospective study (RS) using medical chart data with minimal data quality review and the PRospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study with standardized quality assurance. The second objective was to assess the impact of missing data on clinical prediction algorithms by evaluating blood transfusion prediction models using PROMMTT data. RS (2005-06) and PROMMTT (2009-10) investigated trauma patients receiving ≥ 1 unit of red blood cells (RBC) from ten Level I trauma centers. Missing data were compared for 33 variables collected in both studies using mixed effects logistic regression (including random intercepts for study site). Massive transfusion (MT) patients received ≥ 10 RBC units within 24h of admission. Correct classification percentages for three MT prediction models were evaluated using complete case analysis and multiple imputation based on the multivariate normal distribution. A sensitivity analysis for missing data was conducted to estimate the upper and lower bounds of correct classification using assumptions about missing data under best and worst case scenarios. Most variables (17/33=52%) had <1% missing data in RS and PROMMTT. Of the remaining variables, 50% demonstrated less missingness in PROMMTT, 25% had less missingness in RS, and 25% were similar between studies. Missing percentages for MT prediction variables in PROMMTT ranged from 2.2% (heart rate) to 45% (respiratory rate). For variables missing >1%, study site was associated with missingness (all p≤0.021). Survival time predicted missingness for 50% of RS and 60% of PROMMTT variables. MT models complete case proportions ranged from 41% to 88%. Complete case analysis and multiple imputation demonstrated similar correct classification results. Sensitivity analysis upper-lower bound ranges for the three MT models were 59-63%, 36-46%, and 46-58%. Prospective collection of ten-fold more variables with data quality assurance reduced overall missing data. Study site and patient survival were associated with missingness, suggesting that data were not missing completely at random, and complete case analysis may lead to biased results. Evaluating clinical prediction model accuracy may be misleading in the presence of missing data, especially with many predictor variables. The proposed sensitivity analysis estimating correct classification under upper (best case scenario)/lower (worst case scenario) bounds may be more informative than multiple imputation, which provided results similar to complete case analysis.^
Resumo:
The Phase I clinical trial is considered the "first in human" study in medical research to examine the toxicity of a new agent. It determines the maximum tolerable dose (MTD) of a new agent, i.e., the highest dose in which toxicity is still acceptable. Several phase I clinical trial designs have been proposed in the past 30 years. The well known standard method, so called the 3+3 design, is widely accepted by clinicians since it is the easiest to implement and it does not need a statistical calculation. Continual reassessment method (CRM), a design uses Bayesian method, has been rising in popularity in the last two decades. Several variants of the CRM design have also been suggested in numerous statistical literatures. Rolling six is a new method introduced in pediatric oncology in 2008, which claims to shorten the trial duration as compared to the 3+3 design. The goal of the present research was to simulate clinical trials and compare these phase I clinical trial designs. Patient population was created by discrete event simulation (DES) method. The characteristics of the patients were generated by several distributions with the parameters derived from a historical phase I clinical trial data review. Patients were then selected and enrolled in clinical trials, each of which uses the 3+3 design, the rolling six, or the CRM design. Five scenarios of dose-toxicity relationship were used to compare the performance of the phase I clinical trial designs. One thousand trials were simulated per phase I clinical trial design per dose-toxicity scenario. The results showed the rolling six design was not superior to the 3+3 design in terms of trial duration. The time to trial completion was comparable between the rolling six and the 3+3 design. However, they both shorten the duration as compared to the two CRM designs. Both CRMs were superior to the 3+3 design and the rolling six in accuracy of MTD estimation. The 3+3 design and rolling six tended to assign more patients to undesired lower dose levels. The toxicities were slightly greater in the CRMs.^
Resumo:
Background: The follow-up care for women with breast cancer requires an understanding of disease recurrence patterns and the follow-up visit schedule should be determined according to the times when the recurrence are most likely to occur, so that preventive measure can be taken to avoid or minimize the recurrence. Objective: To model breast cancer recurrence through stochastic process with an aim to generate a hazard function for determining a follow-up schedule. Methods: We modeled the process of disease progression as the time transformed Weiner process and the first-hitting-time was used as an approximation of the true failure time. The women's "recurrence-free survival time" or a "not having the recurrence event" is modeled by the time it takes Weiner process to cross a threshold value which represents a woman experiences breast cancer recurrence event. We explored threshold regression model which takes account of covariates that contributed to the prognosis of breast cancer following development of the first-hitting time model. Using real data from SEER-Medicare, we proposed models of follow-up visits schedule on the basis of constant probability of disease recurrence between consecutive visits. Results: We demonstrated that the threshold regression based on first-hitting-time modeling approach can provide useful predictive information about breast cancer recurrence. Our results suggest the surveillance and follow-up schedule can be determined for women based on their prognostic factors such as tumor stage and others. Women with early stage of disease may be seen less frequently for follow-up visits than those women with locally advanced stages. Our results from SEER-Medicare data support the idea of risk-controlled follow-up strategies for groups of women. Conclusion: The methodology we proposed in this study allows one to determine individual follow-up scheduling based on a parametric hazard function that incorporates known prognostic factors.^
Resumo:
The development of targeted therapy involve many challenges. Our study will address some of the key issues involved in biomarker identification and clinical trial design. In our study, we propose two biomarker selection methods, and then apply them in two different clinical trial designs for targeted therapy development. In particular, we propose a Bayesian two-step lasso procedure for biomarker selection in the proportional hazards model in Chapter 2. In the first step of this strategy, we use the Bayesian group lasso to identify the important marker groups, wherein each group contains the main effect of a single marker and its interactions with treatments. In the second step, we zoom in to select each individual marker and the interactions between markers and treatments in order to identify prognostic or predictive markers using the Bayesian adaptive lasso. In Chapter 3, we propose a Bayesian two-stage adaptive design for targeted therapy development while implementing the variable selection method given in Chapter 2. In Chapter 4, we proposed an alternate frequentist adaptive randomization strategy for situations where a large number of biomarkers need to be incorporated in the study design. We also propose a new adaptive randomization rule, which takes into account the variations associated with the point estimates of survival times. In all of our designs, we seek to identify the key markers that are either prognostic or predictive with respect to treatment. We are going to use extensive simulation to evaluate the operating characteristics of our methods.^
Resumo:
Prevalent sampling is an efficient and focused approach to the study of the natural history of disease. Right-censored time-to-event data observed from prospective prevalent cohort studies are often subject to left-truncated sampling. Left-truncated samples are not randomly selected from the population of interest and have a selection bias. Extensive studies have focused on estimating the unbiased distribution given left-truncated samples. However, in many applications, the exact date of disease onset was not observed. For example, in an HIV infection study, the exact HIV infection time is not observable. However, it is known that the HIV infection date occurred between two observable dates. Meeting these challenges motivated our study. We propose parametric models to estimate the unbiased distribution of left-truncated, right-censored time-to-event data with uncertain onset times. We first consider data from a length-biased sampling, a specific case in left-truncated samplings. Then we extend the proposed method to general left-truncated sampling. With a parametric model, we construct the full likelihood, given a biased sample with unobservable onset of disease. The parameters are estimated through the maximization of the constructed likelihood by adjusting the selection bias and unobservable exact onset. Simulations are conducted to evaluate the finite sample performance of the proposed methods. We apply the proposed method to an HIV infection study, estimating the unbiased survival function and covariance coefficients. ^
Resumo:
Phase I clinical trial is mainly designed to determine the maximum tolerated dose (MTD) of a new drug. Optimization of phase I trial design is crucial to minimize the number of enrolled patients exposed to unsafe dose levels and to provide reliable information to the later phases of clinical trials. Although it has been criticized about its inefficient MTD estimation, nowadays the traditional 3+3 method remains dominant in practice due to its simplicity and conservative estimation. There are many new designs that have been proven to generate more credible MTD estimation, such as the Continual Reassessment Method (CRM). Despite its accepted better performance, the CRM design is still not widely used in real trials. There are several factors that contribute to the difficulties of CRM adaption in practice. First, CRM is not widely accepted by the regulatory agencies such as FDA in terms of safety. It is considered to be less conservative and tend to expose more patients above the MTD level than the traditional design. Second, CRM is relatively complex and not intuitive for the clinicians to fully understand. Third, the CRM method take much more time and need statistical experts and computer programs throughout the trial. The current situation is that the clinicians still tend to follow the trial process that they are comfortable with. This situation is not likely to change in the near future. Based on this situation, we have the motivation to improve the accuracy of MTD selection while follow the procedure of the traditional design to maintain simplicity. We found that in 3+3 method, the dose transition and the MTD determination are relatively independent. Thus we proposed to separate the two stages. The dose transition rule remained the same as 3+3 method. After getting the toxicity information from the dose transition stage, we combined the isotonic transformation to ensure the monotonic increasing order before selecting the optimal MTD. To compare the operating characteristics of the proposed isotonic method and the other designs, we carried out 10,000 simulation trials under different dose setting scenarios to compare the design characteristics of the isotonic modified method with standard 3+3 method, CRM, biased coin design (BC) and k-in-a-row design (KIAW). The isotonic modified method improved MTD estimation of the standard 3+3 in 39 out of 40 scenarios. The improvement is much greater when the target is 0.3 other than 0.25. The modified design is also competitive when comparing with other selected methods. A CRM method performed better in general but was not as stable as the isotonic method throughout the different dose settings. The results demonstrated that our proposed isotonic modified method is not only easily conducted using the same procedure as 3+3 but also outperforms the conventional 3+3 design. It can also be applied to determine MTD for any given TTL. These features make the isotonic modified method of practical value in phase I clinical trials.^
Resumo:
This thesis project is motivated by the potential problem of using observational data to draw inferences about a causal relationship in observational epidemiology research when controlled randomization is not applicable. Instrumental variable (IV) method is one of the statistical tools to overcome this problem. Mendelian randomization study uses genetic variants as IVs in genetic association study. In this thesis, the IV method, as well as standard logistic and linear regression models, is used to investigate the causal association between risk of pancreatic cancer and the circulating levels of soluble receptor for advanced glycation end-products (sRAGE). Higher levels of serum sRAGE were found to be associated with a lower risk of pancreatic cancer in a previous observational study (255 cases and 485 controls). However, such a novel association may be biased by unknown confounding factors. In a case-control study, we aimed to use the IV approach to confirm or refute this observation in a subset of study subjects for whom the genotyping data were available (178 cases and 177 controls). Two-stage IV method using generalized method of moments-structural mean models (GMM-SMM) was conducted and the relative risk (RR) was calculated. In the first stage analysis, we found that the single nucleotide polymorphism (SNP) rs2070600 of the receptor for advanced glycation end-products (AGER) gene meets all three general assumptions for a genetic IV in examining the causal association between sRAGE and risk of pancreatic cancer. The variant allele of SNP rs2070600 of the AGER gene was associated with lower levels of sRAGE, and it was neither associated with risk of pancreatic cancer, nor with the confounding factors. It was a potential strong IV (F statistic = 29.2). However, in the second stage analysis, the GMM-SMM model failed to converge due to non- concaveness probably because of the small sample size. Therefore, the IV analysis could not support the causality of the association between serum sRAGE levels and risk of pancreatic cancer. Nevertheless, these analyses suggest that rs2070600 was a potentially good genetic IV for testing the causality between the risk of pancreatic cancer and sRAGE levels. A larger sample size is required to conduct a credible IV analysis.^
Resumo:
The main objective of this study was to determine the external validity of a clinical prediction rule developed by the European Multicenter Study on Human Spinal Cord Injury (EM-SCI) to predict the ambulation outcomes 12 months after traumatic spinal cord injury. Data from the North American Clinical Trials Network (NACTN) data registry with approximately 500 SCI cases were used for this validity study. The predictive accuracy of the EM-SCI prognostic model was evaluated using calibration and discrimination based on 231 NACTN cases. The area under the receiver-operating-characteristics curve (ROC) curve was 0.927 (95% CI 0.894 – 0.959) for the EM-SCI model when applied to NACTN population. This is lower than the AUC of 0.956 (95% CI 0.936 – 0.976) reported for the EM-SCI population, but suggests that the EM-SCI clinical prediction rule distinguished well between those patients in the NACTN population who were able to achieve independent ambulation and those who did not achieve independent ambulation. The calibration curve suggests that higher the prediction score is, the better the probability of walking with the best prediction for AIS D patients. In conclusion, the EM-SCI clinical prediction rule was determined to be generalizable to the adult NACTN SCI population.^
Resumo:
Head and Neck Squamous Cell Carcinoma (HNSCC) is the sixth common malignancy in the world, with high rates of developing second primary malignancy (SPM) and moderately low survival rates. This disease has become an enormous challenge in the cancer research and treatments. For HNSCC patients, a highly significant cause of post-treatment mortality and morbidity is the development of SPM. Hence, assessment of predicting the risk for the development of SPM would be very helpful for patients, clinicians and policy makers to estimate the survival of patients with HNSCC. In this study, we built a prognostic model to predict the risk of developing SPM in patients with newly diagnosed HNSCC. The dataset used in this research was obtained from The University of Texas MD Anderson Cancer Center. For the first aim, we used stepwise logistic regression to identify the prognostic factors for the development of SPM. Our final model contained cancer site and overall cancer stage as our risk factors for SPM. The Hosmer-Lemeshow test (p-value= 0.15>0.05) showed the final prognostic model fit the data well. The area under the ROC curve was 0.72 that suggested the discrimination ability of our model was acceptable. The internal validation confirmed the prognostic model was a good fit and the final prognostic model would not over optimistically predict the risk of SPM. This model needs external validation by using large data sample size before it can be generalized to predict SPM risk for other HNSCC patients. For the second aim, we utilized a multistate survival analysis approach to estimate the probability of death for HNSCC patients taking into consideration of the possibility of SPM. Patients without SPM were associated with longer survival. These findings suggest that the development of SPM could be a predictor of survival rates among the patients with HNSCC.^
Resumo:
Background: For most cytotoxic and biologic anti-cancer agents, the response rate of the drug is commonly assumed to be non-decreasing with an increasing dose. However, an increasing dose does not always result in an appreciable increase in the response rate. This may especially be true at high doses for a biologic agent. Therefore, in a phase II trial the investigators may be interested in testing the anti-tumor activity of a drug at more than one (often two) doses, instead of only at the maximum tolerated dose (MTD). This way, when the lower dose appears equally effective, this dose can be recommended for further confirmatory testing in a phase III trial under potential long-term toxicity and cost considerations. A common approach to designing such a phase II trial has been to use an independent (e.g., Simon's two-stage) design at each dose ignoring the prior knowledge about the ordering of the response probabilities at the different doses. However, failure to account for this ordering constraint in estimating the response probabilities may result in an inefficient design. In this dissertation, we developed extensions of Simon's optimal and minimax two-stage designs, including both frequentist and Bayesian methods, for two doses that assume ordered response rates between doses. ^ Methods: Optimal and minimax two-stage designs are proposed for phase II clinical trials in settings where the true response rates at two dose levels are ordered. We borrow strength between doses using isotonic regression and control the joint and/or marginal error probabilities. Bayesian two-stage designs are also proposed under a stochastic ordering constraint. ^ Results: Compared to Simon's designs, when controlling the power and type I error at the same levels, the proposed frequentist and Bayesian designs reduce the maximum and expected sample sizes. Most of the proposed designs also increase the probability of early termination when the true response rates are poor. ^ Conclusion: Proposed frequentist and Bayesian designs are superior to Simon's designs in terms of operating characteristics (expected sample size and probability of early termination, when the response rates are poor) Thus, the proposed designs lead to more cost-efficient and ethical trials, and may consequently improve and expedite the drug discovery process. The proposed designs may be extended to designs of multiple group trials and drug combination trials.^
Resumo:
Objective: The primary objective of our study was to study the effect of metformin in patients of metastatic renal cell cancer (mRCC) and diabetes who are on treatment with frontline therapy of tyrosine kinase inhibitors. The effect of therapy was described in terms of overall survival and progression free survival. Comparisons were made between group of patients receiving metformin versus group of patients receiving insulin in diabetic patients of metastatic renal cancer on frontline therapy. Exploratory analyses were also done comparing non-diabetic patients of metastatic renal cell cancer receiving frontline therapy compared to diabetic patients of metastatic renal cell cancer receiving metformin therapy. ^ Methods: The study design is a retrospective case series to elaborate the response rate of frontline therapy in combination with metformin for mRCC patients with type 2 diabetes mellitus. The cohort was selected from a database, which was generated for assessing the effect of tyrosine kinase inhibitor therapy associated hypertension in metastatic renal cell cancer at MD Anderson Cancer Center. Patients who had been started on frontline therapy for metastatic renal cell carcinoma from all ethnic and racial backgrounds were selected for the study. The exclusion criteria would be of patients who took frontline therapy for less than 3 months or were lost to follow-up. Our exposure variable was treatment with metformin, which comprised of patients who took metformin for the treatment of type 2 diabetes at any time of diagnosis of metastatic renal cell carcinoma. The outcomes assessed were last available follow-up or date of death for the overall survival and date of progression of disease from their radiological reports for time to progression. The response rates were compared by covariates that are known to be strongly associated with renal cell cancer. ^ Results: For our primary analyses between the insulin and metformin group, there were 82 patients, out of which 50 took insulin therapy and 32 took metformin therapy for type 2 diabetes. For our exploratory analysis, we compared 32 diabetic patients on metformin to 146 non-diabetic patients, not on metformin. Baseline characteristics were compared among the population. The time from the start of treatment until the date of progression of renal cell cancer and date of death or last follow-up were estimated for survival analysis. ^ In our primary analyses, there was a significant difference in the time to progression of patients receiving metformin therapy vs insulin therapy, which was also seen in our exploratory analyses. The median time to progression in primary analyses was 1259 days (95% CI: 659-1832 days) in patients on metformin therapy compared to 540 days (95% CI: 350-894) in patients who were receiving insulin therapy (p=0.024). The median time to progression in exploratory analyses was 1259 days (95% CI: 659-1832 days) in patients on metformin therapy compared to 279 days (95% CI: 202-372 days) in non-diabetic group (p-value <0.0001). ^ The median overall survival was 1004 days in metformin group (95% CI: 761-1212 days) compared to 816 days (95%CI: 558-1405 days) in insulin group (p-value<0.91). For the exploratory analyses, the median overall survival was 1004 days in metformin group (95% CI: 761-1212 days) compared to 766 days (95%CI: 649-965 days) in the non-diabetic group (p-value<0.78). Metformin was observed to increase the progression free survival in both the primary and exploratory analyses (HR=0.52 in metformin Vs insulin group and HR=0.36 in metformin Vs non-diabetic group, respectively). ^ Conclusion: In laboratory studies and a few clinical studies metformin has been proven to have dual benefits in patients suffering from cancer and type 2-diabetes via its action on the mammalian target of Rapamycin pathway and effect in decreasing blood sugar by increasing the sensitivity of the insulin receptors to insulin. Several studies in breast cancer patients have documented a beneficial effect (quantified by pathological remission of cancer) of metformin use in patients taking treatment for breast cancer therapy. Combination of metformin therapy in patients taking frontline therapy for renal cell cancer may provide a significant benefit in prolonging the overall survival in patients with metastatic renal cell cancer and diabetes. ^