945 resultados para Biology, Biostatistics
Resumo:
Improvements in the analysis of microarray images are critical for accurately quantifying gene expression levels. The acquisition of accurate spot intensities directly influences the results and interpretation of statistical analyses. This dissertation discusses the implementation of a novel approach to the analysis of cDNA microarray images. We use a stellar photometric model, the Moffat function, to quantify microarray spots from nylon microarray images. The inherent flexibility of the Moffat shape model makes it ideal for quantifying microarray spots. We apply our novel approach to a Wilms' tumor microarray study and compare our results with a fixed-circle segmentation approach for spot quantification. Our results suggest that different spot feature extraction methods can have an impact on the ability of statistical methods to identify differentially expressed genes. We also used the Moffat function to simulate a series of microarray images under various experimental conditions. These simulations were used to validate the performance of various statistical methods for identifying differentially expressed genes. Our simulation results indicate that tests taking into account the dependency between mean spot intensity and variance estimation, such as the smoothened t-test, can better identify differentially expressed genes, especially when the number of replicates and mean fold change are low. The analysis of the simulations also showed that overall, a rank sum test (Mann-Whitney) performed well at identifying differentially expressed genes. Previous work has suggested the strengths of nonparametric approaches for identifying differentially expressed genes. We also show that multivariate approaches, such as hierarchical and k-means cluster analysis along with principal components analysis, are only effective at classifying samples when replicate numbers and mean fold change are high. Finally, we show how our stellar shape model approach can be extended to the analysis of 2D-gel images by adapting the Moffat function to take into account the elliptical nature of spots in such images. Our results indicate that stellar shape models offer a previously unexplored approach for the quantification of 2D-gel spots. ^
Resumo:
When conducting a randomized comparative clinical trial, ethical, scientific or economic considerations often motivate the use of interim decision rules after successive groups of patients have been treated. These decisions may pertain to the comparative efficacy or safety of the treatments under study, cost considerations, the desire to accelerate the drug evaluation process, or the likelihood of therapeutic benefit for future patients. At the time of each interim decision, an important question is whether patient enrollment should continue or be terminated; either due to a high probability that one treatment is superior to the other, or a low probability that the experimental treatment will ultimately prove to be superior. The use of frequentist group sequential decision rules has become routine in the conduct of phase III clinical trials. In this dissertation, we will present a new Bayesian decision-theoretic approach to the problem of designing a randomized group sequential clinical trial, focusing on two-arm trials with time-to-failure outcomes. Forward simulation is used to obtain optimal decision boundaries for each of a set of possible models. At each interim analysis, we use Bayesian model selection to adaptively choose the model having the largest posterior probability of being correct, and we then make the interim decision based on the boundaries that are optimal under the chosen model. We provide a simulation study to compare this method, which we call Bayesian Doubly Optimal Group Sequential (BDOGS), to corresponding frequentist designs using either O'Brien-Fleming (OF) or Pocock boundaries, as obtained from EaSt 2000. Our simulation results show that, over a wide variety of different cases, BDOGS either performs at least as well as both OF and Pocock, or on average provides a much smaller trial. ^
Resumo:
Many phase II clinical studies in oncology use two-stage frequentist design such as Simon's optimal design. However, they have a common logistical problem regarding the patient accrual at the interim. Strictly speaking, patient accrual at the end of the first stage may have to be suspended until all patients have events, success or failure. For example, when the study endpoint is six-month progression free survival, patient accrual has to be stopped until all outcomes from stage I is observed. However, study investigators may have concern when accrual is suspended after the first stage due to the loss of accrual momentum during this hiatus. We propose a two-stage phase II design that resolves the patient accrual problem due to an interim analysis, and it can be used as an alternative way to frequentist two-stage phase II studies in oncology. ^
Resumo:
The joint modeling of longitudinal and survival data is a new approach to many applications such as HIV, cancer vaccine trials and quality of life studies. There are recent developments of the methodologies with respect to each of the components of the joint model as well as statistical processes that link them together. Among these, second order polynomial random effect models and linear mixed effects models are the most commonly used for the longitudinal trajectory function. In this study, we first relax the parametric constraints for polynomial random effect models by using Dirichlet process priors, then three longitudinal markers rather than only one marker are considered in one joint model. Second, we use a linear mixed effect model for the longitudinal process in a joint model analyzing the three markers. In this research these methods were applied to the Primary Biliary Cirrhosis sequential data, which were collected from a clinical trial of primary biliary cirrhosis (PBC) of the liver. This trial was conducted between 1974 and 1984 at the Mayo Clinic. The effects of three longitudinal markers (1) Total Serum Bilirubin, (2) Serum Albumin and (3) Serum Glutamic-Oxaloacetic transaminase (SGOT) on patients' survival were investigated. Proportion of treatment effect will also be studied using the proposed joint modeling approaches. ^ Based on the results, we conclude that the proposed modeling approaches yield better fit to the data and give less biased parameter estimates for these trajectory functions than previous methods. Model fit is also improved after considering three longitudinal markers instead of one marker only. The results from analysis of proportion of treatment effects from these joint models indicate same conclusion as that from the final model of Fleming and Harrington (1991), which is Bilirubin and Albumin together has stronger impact in predicting patients' survival and as a surrogate endpoints for treatment. ^
Resumo:
The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
Anticancer drugs typically are administered in the clinic in the form of mixtures, sometimes called combinations. Only in rare cases, however, are mixtures approved as drugs. Rather, research on mixtures tends to occur after single drugs have been approved. The goal of this research project was to develop modeling approaches that would encourage rational preclinical mixture design. To this end, a series of models were developed. First, several QSAR classification models were constructed to predict the cytotoxicity, oral clearance, and acute systemic toxicity of drugs. The QSAR models were applied to a set of over 115,000 natural compounds in order to identify promising ones for testing in mixtures. Second, an improved method was developed to assess synergistic, antagonistic, and additive effects between drugs in a mixture. This method, dubbed the MixLow method, is similar to the Median-Effect method, the de facto standard for assessing drug interactions. The primary difference between the two is that the MixLow method uses a nonlinear mixed-effects model to estimate parameters of concentration-effect curves, rather than an ordinary least squares procedure. Parameter estimators produced by the MixLow method were more precise than those produced by the Median-Effect Method, and coverage of Loewe index confidence intervals was superior. Third, a model was developed to predict drug interactions based on scores obtained from virtual docking experiments. This represents a novel approach for modeling drug mixtures and was more useful for the data modeled here than competing approaches. The model was applied to cytotoxicity data for 45 mixtures, each composed of up to 10 selected drugs. One drug, doxorubicin, was a standard chemotherapy agent and the others were well-known natural compounds including curcumin, EGCG, quercetin, and rhein. Predictions of synergism/antagonism were made for all possible fixed-ratio mixtures, cytotoxicities of the 10 best-scoring mixtures were tested, and drug interactions were assessed. Predicted and observed responses were highly correlated (r2 = 0.83). Results suggested that some mixtures allowed up to an 11-fold reduction of doxorubicin concentrations without sacrificing efficacy. Taken together, the models developed in this project present a general approach to rational design of mixtures during preclinical drug development. ^
Resumo:
Drinking water-related exposures within populations living in the United States-Mexico border region, particularly among Hispanics, is an area that is largely unknown. Specifically, perceptions that may affect water source selection is an issue that has not been fully addressed. This study evaluates drinking water quality perceptions in a mostly Hispanic community living along the United States-Mexico border, a community also facing water scarcity issues. Using a survey that was administered during two seasons (winter and summer), data were collected from a total of 608 participants, of which 303 were living in the United States and 305 in Mexico. A (random) convenience sampling technique was used to select households and those interviewed were over 18 years of age. Statistically significant differences were observed involving country of residence (p=0.002). Specifically, those living in Mexico reported a higher use of bottled water than those living in the United States. Perception factors, especially taste, were cited as main reasons for not selecting unfiltered tap water as a primary drinking water source. Understanding what influences drinking water source preference can aid in the development of risk communication strategies regarding water quality. ^
Resumo:
This study investigates a theoretical model where a longitudinal process, that is a stationary Markov-Chain, and a Weibull survival process share a bivariate random effect. Furthermore, a Quality-of-Life adjusted survival is calculated as the weighted sum of survival time. Theoretical values of population mean adjusted survival of the described model are computed numerically. The parameters of the bivariate random effect do significantly affect theoretical values of population mean. Maximum-Likelihood and Bayesian methods are applied on simulated data to estimate the model parameters. Based on the parameter estimates, predicated population mean adjusted survival can then be calculated numerically and compared with the theoretical values. Bayesian method and Maximum-Likelihood method provide parameter estimations and population mean prediction with comparable accuracy; however Bayesian method suffers from poor convergence due to autocorrelation and inter-variable correlation. ^
Resumo:
Hereditary nonpolyposis colorectal cancer (HNPCC) is an autosomal dominant disease caused by germline mutations in DNA mismatch repair(MMR) genes. The nucleotide excision repair(NER) pathway plays a very important role in cancer development. We systematically studied interactions between NER and MMR genes to identify NER gene single nucleotide polymorphism (SNP) risk factors that modify the effect of MMR mutations on risk for cancer in HNPCC. We analyzed data from polymorphisms in 10 NER genes that had been genotyped in HNPCC patients that carry MSH2 and MLH1 gene mutations. The influence of the NER gene SNPs on time to onset of colorectal cancer (CRC) was assessed using survival analysis and a semiparametric proportional hazard model. We found the median age of onset for CRC among MMR mutation carriers with the ERCC1 mutation was 3.9 years earlier than patients with wildtype ERCC1(median 47.7 vs 51.6, log-rank test p=0.035). The influence of Rad23B A249V SNP on age of onset of HNPCC is age dependent (likelihood ratio test p=0.0056). Interestingly, using the likelihood ratio test, we also found evidence of genetic interactions between the MMR gene mutations and SNPs in ERCC1 gene(C8092A) and XPG/ERCC5 gene(D1104H) with p-values of 0.004 and 0.042, respectively. An assessment using tree structured survival analysis (TSSA) showed distinct gene interactions in MLH1 mutation carriers and MSH2 mutation carriers. ERCC1 SNP genotypes greatly modified the age onset of HNPCC in MSH2 mutation carriers, while no effect was detected in MLH1 mutation carriers. Given the NER genes in this study play different roles in NER pathway, they may have distinct influences on the development of HNPCC. The findings of this study are very important for elucidation of the molecular mechanism of colon cancer development and for understanding why some mutation carriers of the MSH2 and MLH1 gene develop CRC early and others never develop CRC. Overall, the findings also have important implications for the development of early detection strategies and prevention as well as understanding the mechanism of colorectal carcinogenesis in HNPCC. ^
Resumo:
With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^
Resumo:
Several studies have examined the association between high glycemic index (GI) and glycemic load (GL) diets and the risk for coronary heart disease (CHD). However, most of these studies were conducted primarily on white populations. The primary aim of this study was to examine whether high GI and GL diets are associated with increased risk for developing CHD in whites and African Americans, non-diabetics and diabetics, and within stratifications of body mass index (BMI) and hypertension (HTN). Baseline and 17-year follow-up data from ARIC (Atherosclerosis Risk in Communities) study was used. The study population (13,051) consisted of 74% whites, 26% African Americans, 89% non-diabetics, 11% diabetics, 43% male, 57% female aged 44 to 66 years at baseline. Data from the ARIC food frequency questionnaire at baseline were analyzed to provide GI and GL indices for each subject. Increases of 25 and 30 units for GI and GL respectively were used to describe relationships on incident CHD risk. Adjusted hazard ratios for propensity score with 95% confidence intervals (CI) were used to assess associations. During 17 years of follow-up (1987 to 2004), 1,683 cases of CHD was recorded. Glycemic index was associated with 2.12 fold (95% CI: 1.05, 4.30) increased incident CHD risk for all African Americans and GL was associated with 1.14 fold (95% CI: 1.04, 1.25) increased CHD risk for all whites. In addition, GL was also an important CHD risk factor for white non-diabetics (HR=1.59; 95% CI: 1.33, 1.90). Furthermore, within stratum of BMI 23.0 to 29.9 in non-diabetics, GI was associated with an increased hazard ratio of 11.99 (95% CI: 2.31, 62.18) for CHD in African Americans, and GL was associated with 1.23 fold (1.08, 1.39) increased CHD risk in whites. Body mass index modified the effect of GI and GL on CHD risk in all whites and white non-diabetics. For HTN, both systolic blood pressure and diastolic blood pressure modified the effect on GI and GL on CHD risk in all whites and African Americans, white and African American non-diabetics, and white diabetics. Further studies should examine other factors that could influence the effects of GI and GL on CHD risk, including dietary factors, physical activity, and diet-gene interactions. ^
Resumo:
Monte Carlo simulation has been conducted to investigate parameter estimation and hypothesis testing in some well known adaptive randomization procedures. The four urn models studied are Randomized Play-the-Winner (RPW), Randomized Pôlya Urn (RPU), Birth and Death Urn with Immigration (BDUI), and Drop-the-Loses Urn (DL). Two sequential estimation methods, the sequential maximum likelihood estimation (SMLE) and the doubly adaptive biased coin design (DABC), are simulated at three optimal allocation targets that minimize the expected number of failures under the assumption of constant variance of simple difference (RSIHR), relative risk (ORR), and odds ratio (OOR) respectively. Log likelihood ratio test and three Wald-type tests (simple difference, log of relative risk, log of odds ratio) are compared in different adaptive procedures. ^ Simulation results indicates that although RPW is slightly better in assigning more patients to the superior treatment, the DL method is considerably less variable and the test statistics have better normality. When compared with SMLE, DABC has slightly higher overall response rate with lower variance, but has larger bias and variance in parameter estimation. Additionally, the test statistics in SMLE have better normality and lower type I error rate, and the power of hypothesis testing is more comparable with the equal randomization. Usually, RSIHR has the highest power among the 3 optimal allocation ratios. However, the ORR allocation has better power and lower type I error rate when the log of relative risk is the test statistics. The number of expected failures in ORR is smaller than RSIHR. It is also shown that the simple difference of response rates has the worst normality among all 4 test statistics. The power of hypothesis test is always inflated when simple difference is used. On the other hand, the normality of the log likelihood ratio test statistics is robust against the change of adaptive randomization procedures. ^
Resumo:
Of cancer death, colorectal cancer death ranks second in the United States. Obesity is an important risk factor for colorectal cancer (1). Early detection of colorectal cancer when it is localized can effectively reduce mortality of colorectal cancer and increase survival time of patients if they are treated. Also, previous studies showed that obese women were more likely to delay breast cancer screening and cervical cancer screening than normal weight women (2-5). However, results from prior studies demonstrating the relationship between obesity and colorectal cancer screening are not consistent. This research was done to conduct a meta-analysis of previous cross-sectional studies selected from the Medline database and to evaluate the association between obesity and colorectal cancer screening. While the odds ratio was not statistically different from one, the results from this meta-analysis under the random effects model showed that obese people are slightly less likely to have colorectal cancer screening compared to normal weight individuals (OR,0.93;95% CI 0.75-1.15). This meta-analysis was particularly sensitive to one individual study (6) and the effect of obesity on colorectal cancer screening was statistically significant (OR, 0.87; 95% CI, 0.81-0.92) after removing Heo's study. Further systematic studies focused on whether the effect of obesity on colorectal cancer screening is limited to women only are suggested. ^
Resumo:
Standard methods for testing safety data are needed to ensure the safe conduct of clinical trials. In particular, objective rules for reliably identifying unsafe treatments need to be put into place to help protect patients from unnecessary harm. DMCs are uniquely qualified to evaluate accumulating unblinded data and make recommendations about the continuing safe conduct of a trial. However, it is the trial leadership who must make the tough ethical decision about stopping a trial, and they could benefit from objective statistical rules that help them judge the strength of evidence contained in the blinded data. We design early stopping rules for harm that act as continuous safety screens for randomized controlled clinical trials with blinded treatment information, which could be used by anyone, including trial investigators (and trial leadership). A Bayesian framework, with emphasis on the likelihood function, is used to allow for continuous monitoring without adjusting for multiple comparisons. Close collaboration between the statistician and the clinical investigators will be needed in order to design safety screens with good operating characteristics. Though the math underlying this procedure may be computationally intensive, implementation of the statistical rules will be easy and the continuous screening provided will give suitably early warning when real problems were to emerge. Trial investigators and trial leadership need these safety screens to help them to effectively monitor the ongoing safe conduct of clinical trials with blinded data.^