650 resultados para Biology, Biostatistics|Health Sciences, Nutrition|Health Sciences, Epidemiology|Health Sciences, Oncology
Resumo:
With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^
Resumo:
Of cancer death, colorectal cancer death ranks second in the United States. Obesity is an important risk factor for colorectal cancer (1). Early detection of colorectal cancer when it is localized can effectively reduce mortality of colorectal cancer and increase survival time of patients if they are treated. Also, previous studies showed that obese women were more likely to delay breast cancer screening and cervical cancer screening than normal weight women (2-5). However, results from prior studies demonstrating the relationship between obesity and colorectal cancer screening are not consistent. This research was done to conduct a meta-analysis of previous cross-sectional studies selected from the Medline database and to evaluate the association between obesity and colorectal cancer screening. While the odds ratio was not statistically different from one, the results from this meta-analysis under the random effects model showed that obese people are slightly less likely to have colorectal cancer screening compared to normal weight individuals (OR,0.93;95% CI 0.75-1.15). This meta-analysis was particularly sensitive to one individual study (6) and the effect of obesity on colorectal cancer screening was statistically significant (OR, 0.87; 95% CI, 0.81-0.92) after removing Heo's study. Further systematic studies focused on whether the effect of obesity on colorectal cancer screening is limited to women only are suggested. ^
Resumo:
Standard methods for testing safety data are needed to ensure the safe conduct of clinical trials. In particular, objective rules for reliably identifying unsafe treatments need to be put into place to help protect patients from unnecessary harm. DMCs are uniquely qualified to evaluate accumulating unblinded data and make recommendations about the continuing safe conduct of a trial. However, it is the trial leadership who must make the tough ethical decision about stopping a trial, and they could benefit from objective statistical rules that help them judge the strength of evidence contained in the blinded data. We design early stopping rules for harm that act as continuous safety screens for randomized controlled clinical trials with blinded treatment information, which could be used by anyone, including trial investigators (and trial leadership). A Bayesian framework, with emphasis on the likelihood function, is used to allow for continuous monitoring without adjusting for multiple comparisons. Close collaboration between the statistician and the clinical investigators will be needed in order to design safety screens with good operating characteristics. Though the math underlying this procedure may be computationally intensive, implementation of the statistical rules will be easy and the continuous screening provided will give suitably early warning when real problems were to emerge. Trial investigators and trial leadership need these safety screens to help them to effectively monitor the ongoing safe conduct of clinical trials with blinded data.^
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
Glioblastoma, also known as glioblastoma multiform or GBM, is the most common and most malignant primary brain tumor. The clinical history of patients with glioblastoma is short, usually less than 3 months in more than 50% of cases after diagnosis. Currently, the methods of glioblastoma treatment are chemotherapy, radiotherapy and surgery. Even with the more effective treatment options, patients with glioblastoma most likely have a median survival time of 10 to 12 months. It is necessary to seek other treatment methods, including gene-targeted treatment. The success of gene-targeted treatment depends critically on the knowledge of genes that may be the cause of, or contribute to disease. To establish a correlate between glioblastoma survival timeline and micro RNA expression alteration, a study of 91 glioblastoma patients was conducted at the University of Texas M. D. Anderson Cancer Center. These 91 glioblastoma patients were newly diagnosed from 2002 to 2007. Statistical analysis was conducted to test the association of miRNA expression alteration between long-term survival and short-term survival glioblastoma. The completion of this proposed study will provide a better understanding of the regulatory role of miRNA in glioblastoma progression.^
Resumo:
Bayesian adaptive randomization (BAR) is an attractive approach to allocate more patients to the putatively superior arm based on the interim data while maintains good statistical properties attributed to randomization. Under this approach, patients are adaptively assigned to a treatment group based on the probability that the treatment is better. The basic randomization scheme can be modified by introducing a tuning parameter, replacing the posterior estimated response probability, setting a boundary to randomization probabilities. Under randomization settings comprised of the above modifications, operating characteristics, including type I error, power, sample size, imbalance of sample size, interim success rate, and overall success rate, were evaluated through simulation. All randomization settings have low and comparable type I errors. Increasing tuning parameter decreases power, but increases imbalance of sample size and interim success rate. Compared with settings using the posterior probability, settings using the estimated response rates have higher power and overall success rate, but less imbalance of sample size and lower interim success rate. Bounded settings have higher power but less imbalance of sample size than unbounded settings. All settings have better performance in the Bayesian design than in the frequentist design. This simulation study provided practical guidance on the choice of how to implement the adaptive design. ^
Resumo:
Numerous studies have been carried out to try to better understand the genetic predisposition for cardiovascular disease. Although it is widely believed that multifactorial diseases such as cardiovascular disease is the result from effects of many genes which working alone or interact with other genes, most genetic studies have been focused on identifying of cardiovascular disease susceptibility genes and usually ignore the effects of gene-gene interactions in the analysis. The current study applies a novel linkage disequilibrium based statistic for testing interactions between two linked loci using data from a genome-wide study of cardiovascular disease. A total of 53,394 single nucleotide polymorphisms (SNPs) are tested for pair-wise interactions, and 8,644 interactions are found to be significant with p-values less than 3.5×10-11. Results indicate that known cardiovascular disease susceptibility genes tend not to have many significantly interactions. One SNP in the CACNG1 (calcium channel, voltage-dependent, gamma subunit 1) gene and one SNP in the IL3RA (interleukin 3 receptor, alpha) gene are found to have the most significant pair-wise interactions. Findings from the current study should be replicated in other independent cohort to eliminate potential false positive results.^
Resumo:
A multivariate frailty hazard model is developed for joint-modeling of three correlated time-to-event outcomes: (1) local recurrence, (2) distant recurrence, and (3) overall survival. The term frailty is introduced to model population heterogeneity. The dependence is modeled by conditioning on a shared frailty that is included in the three hazard functions. Independent variables can be included in the model as covariates. The Markov chain Monte Carlo methods are used to estimate the posterior distributions of model parameters. The algorithm used in present application is the hybrid Metropolis-Hastings algorithm, which simultaneously updates all parameters with evaluations of gradient of log posterior density. The performance of this approach is examined based on simulation studies using Exponential and Weibull distributions. We apply the proposed methods to a study of patients with soft tissue sarcoma, which motivated this research. Our results indicate that patients with chemotherapy had better overall survival with hazard ratio of 0.242 (95% CI: 0.094 - 0.564) and lower risk of distant recurrence with hazard ratio of 0.636 (95% CI: 0.487 - 0.860), but not significantly better in local recurrence with hazard ratio of 0.799 (95% CI: 0.575 - 1.054). The advantages and limitations of the proposed models, and future research directions are discussed. ^
Resumo:
The difficulty of detecting differential gene expression in microarray data has existed for many years. Several correction procedures try to avoid the family-wise error rate in multiple comparison process, including the Bonferroni and Sidak single-step p-value adjustments, Holm's step-down correction method, and Benjamini and Hochberg's false discovery rate (FDR) correction procedure. Each multiple comparison technique has its advantages and weaknesses. We studied each multiple comparison method through numerical studies (simulations) and applied the methods to the real exploratory DNA microarray data, which detect of molecular signatures in papillary thyroid cancer (PTC) patients. According to our results of simulation studies, Benjamini and Hochberg step-up FDR controlling procedure is the best process among these multiple comparison methods and we discovered 1277 potential biomarkers among 54675 probe sets after applying the Benjamini and Hochberg's method to PTC microarray data.^
Resumo:
Although the area under the receiver operating characteristic (AUC) is the most popular measure of the performance of prediction models, it has limitations, especially when it is used to evaluate the added discrimination of a new biomarker in the model. Pencina et al. (2008) proposed two indices, the net reclassification improvement (NRI) and integrated discrimination improvement (IDI), to supplement the improvement in the AUC (IAUC). Their NRI and IDI are based on binary outcomes in case-control settings, which do not involve time-to-event outcome. However, many disease outcomes are time-dependent and the onset time can be censored. Measuring discrimination potential of a prognostic marker without considering time to event can lead to biased estimates. In this dissertation, we have extended the NRI and IDI to survival analysis settings and derived the corresponding sample estimators and asymptotic tests. Simulation studies were conducted to compare the performance of the time-dependent NRI and IDI with Pencina’s NRI and IDI. For illustration, we have applied the proposed method to a breast cancer study.^ Key words: Prognostic model, Discrimination, Time-dependent NRI and IDI ^
Resumo:
SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). The quality of the inferences about copy number can be affected by many factors including batch effects, DNA sample preparation, signal processing, and analytical approach. Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP genotyping data. However, these algorithms lack specificity to detect small CNVs due to the high false positive rate when calling CNVs based on the intensity values. Association tests based on detected CNVs therefore lack power even if the CNVs affecting disease risk are common. In this research, by combining an existing Hidden Markov Model (HMM) and the logistic regression model, a new genome-wide logistic regression algorithm was developed to detect CNV associations with diseases. We showed that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than an existing popular algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV.^
Resumo:
Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^
Resumo:
Common endpoints can be divided into two categories. One is dichotomous endpoints which take only fixed values (most of the time two values). The other is continuous endpoints which can be any real number between two specified values. Choices of primary endpoints are critical in clinical trials. If we only use dichotomous endpoints, the power could be underestimated. If only continuous endpoints are chosen, we may not obtain expected sample size due to occurrence of some significant clinical events. Combined endpoints are used in clinical trials to give additional power. However, current combined endpoints or composite endpoints in cardiovascular disease clinical trials or most clinical trials are endpoints that combine either dichotomous endpoints (total mortality + total hospitalization), or continuous endpoints (risk score). Our present work applied U-statistic to combine one dichotomous endpoint and one continuous endpoint, which has three different assessments and to calculate the sample size and test the hypothesis to see if there is any treatment effect. It is especially useful when some patients cannot provide the most precise measurement due to medical contraindication or some personal reasons. Results show that this method has greater power then the analysis using continuous endpoints alone. ^
Resumo:
Most studies of differential gene-expressions have been conducted between two given conditions. The two-condition experimental (TCE) approach is simple in that all genes detected display a common differential expression pattern responsive to a common two-condition difference. Therefore, the genes that are differentially expressed under the other conditions other than the given two conditions are undetectable with the TCE approach. In order to address the problem, we propose a new approach called multiple-condition experiment (MCE) without replication and develop corresponding statistical methods including inference of pairs of conditions for genes, new t-statistics, and a generalized multiple-testing method for any multiple-testing procedure via a control parameter C. We applied these statistical methods to analyze our real MCE data from breast cancer cell lines and found that 85 percent of gene-expression variations were caused by genotypic effects and genotype-ANAX1 overexpression interactions, which agrees well with our expected results. We also applied our methods to the adenoma dataset of Notterman et al. and identified 93 differentially expressed genes that could not be found in TCE. The MCE approach is a conceptual breakthrough in many aspects: (a) many conditions of interests can be conducted simultaneously; (b) study of association between differential expressions of genes and conditions becomes easy; (c) it can provide more precise information for molecular classification and diagnosis of tumors; (d) it can save lot of experimental resources and time for investigators.^
Resumo:
Systemic sclerosis (SSc) or Scleroderma is a complex disease and its etiopathogenesis remains unelucidated. Fibrosis in multiple organs is a key feature of SSc and studies have shown that transforming growth factor-β (TGF-β) pathway has a crucial role in fibrotic responses. For a complex disease such as SSc, expression quantitative trait loci (eQTL) analysis is a powerful tool for identifying genetic variations that affect expression of genes involved in this disease. In this study, a multilevel model is described to perform a multivariate eQTL for identifying genetic variation (SNPs) specifically associated with the expression of three members of TGF-β pathway, CTGF, SPARC and COL3A1. The uniqueness of this model is that all three genes were included in one model, rather than one gene being examined at a time. A protein might contribute to multiple pathways and this approach allows the identification of important genetic variations linked to multiple genes belonging to the same pathway. In this study, 29 SNPs were identified and 16 of them located in known genes. Exploring the roles of these genes in TGF-β regulation will help elucidate the etiology of SSc, which will in turn help to better manage this complex disease. ^