8 resultados para Computational Intelligence in data-driven and hybrid Models and Data Analysis
em DigitalCommons@The Texas Medical Center
Resumo:
An interim analysis is usually applied in later phase II or phase III trials to find convincing evidence of a significant treatment difference that may lead to trial termination at an earlier point than planned at the beginning. This can result in the saving of patient resources and shortening of drug development and approval time. In addition, ethics and economics are also the reasons to stop a trial earlier. In clinical trials of eyes, ears, knees, arms, kidneys, lungs, and other clustered treatments, data may include distribution-free random variables with matched and unmatched subjects in one study. It is important to properly include both subjects in the interim and the final analyses so that the maximum efficiency of statistical and clinical inferences can be obtained at different stages of the trials. So far, no publication has applied a statistical method for distribution-free data with matched and unmatched subjects in the interim analysis of clinical trials. In this simulation study, the hybrid statistic was used to estimate the empirical powers and the empirical type I errors among the simulated datasets with different sample sizes, different effect sizes, different correlation coefficients for matched pairs, and different data distributions, respectively, in the interim and final analysis with 4 different group sequential methods. Empirical powers and empirical type I errors were also compared to those estimated by using the meta-analysis t-test among the same simulated datasets. Results from this simulation study show that, compared to the meta-analysis t-test commonly used for data with normally distributed observations, the hybrid statistic has a greater power for data observed from normally, log-normally, and multinomially distributed random variables with matched and unmatched subjects and with outliers. Powers rose with the increase in sample size, effect size, and correlation coefficient for the matched pairs. In addition, lower type I errors were observed estimated by using the hybrid statistic, which indicates that this test is also conservative for data with outliers in the interim analysis of clinical trials.^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
High-throughput assays, such as yeast two-hybrid system, have generated a huge amount of protein-protein interaction (PPI) data in the past decade. This tremendously increases the need for developing reliable methods to systematically and automatically suggest protein functions and relationships between them. With the available PPI data, it is now possible to study the functions and relationships in the context of a large-scale network. To data, several network-based schemes have been provided to effectively annotate protein functions on a large scale. However, due to those inherent noises in high-throughput data generation, new methods and algorithms should be developed to increase the reliability of functional annotations. Previous work in a yeast PPI network (Samanta and Liang, 2003) has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional associations between proteins, and hence suggest their functions. One advantage of the work is that their algorithm is not sensitive to noises (false positives) in high-throughput PPI data. In this study, we improved their prediction scheme by developing a new algorithm and new methods which we applied on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting functionally associated proteins. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as independent and unbiased benchmarks to evaluate our algorithms and methods within the human PPI network. We showed that, compared with the previous work from Samanta and Liang, our algorithm and methods developed in this study improved the overall quality of functional inferences for human proteins. By applying the algorithms to the human PPI network, we obtained 4,233 significant functional associations among 1,754 proteins. Further comparisons of their KEGG and GO annotations allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made pathway analysis to identify several subclusters that are highly enriched in certain signaling pathways. Particularly, we performed a detailed analysis on a subcluster enriched in the transforming growth factor β signaling pathway (P<10-50) which is important in cell proliferation and tumorigenesis. Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotations in this post-genomic era.
Resumo:
Most statistical analysis, theory and practice, is concerned with static models; models with a proposed set of parameters whose values are fixed across observational units. Static models implicitly assume that the quantified relationships remain the same across the design space of the data. While this is reasonable under many circumstances this can be a dangerous assumption when dealing with sequentially ordered data. The mere passage of time always brings fresh considerations and the interrelationships among parameters, or subsets of parameters, may need to be continually revised. ^ When data are gathered sequentially dynamic interim monitoring may be useful as new subject-specific parameters are introduced with each new observational unit. Sequential imputation via dynamic hierarchical models is an efficient strategy for handling missing data and analyzing longitudinal studies. Dynamic conditional independence models offers a flexible framework that exploits the Bayesian updating scheme for capturing the evolution of both the population and individual effects over time. While static models often describe aggregate information well they often do not reflect conflicts in the information at the individual level. Dynamic models prove advantageous over static models in capturing both individual and aggregate trends. Computations for such models can be carried out via the Gibbs sampler. An application using a small sample repeated measures normally distributed growth curve data is presented. ^
Resumo:
The poly-D-glutamic acid capsule of Bacillus anthracis is considered essential for lethal anthrax disease. Yet investigations of capsule function have been limited primarily to attenuated B. anthracis strains lacking certain genetic elements. In work presented in this thesis, I constructed and characterized a genetically complete (pXO1 + pXO2+) B. anthracis strain (UT500) and isogenic mutants deleted for two previously identified capsule gene regulators, atxA and acpA, and a newly-identified regulator, acpB. Results of transcriptional analysis and microscopy revealed that atxA controls expression of the first gene of the capsule biosynthesis operon, capB, via positive transcriptional regulation of acpA and acpB. acpA and acpB appear to be partial functional homologs. Deletion of either gene alone has little effect on capsule synthesis. However, a mutant deleted for both acpA and acpB is noncapsulated. Thus, in contrast to previously published models, my results suggest that atxA is the master regulator of cap gene expression in a genetically complete strain. A detailed transcriptional analysis of capB and the regulatory genes was performed to establish the effects of the regulators and CO2/bicarbonate on specific mRNAs of target genes. CO2/bicarbonate is a well-established signal for B. anthracis capsule synthesis in culture. Taqman RT-PCR results indicated that growth in the presence of elevated CO2 greatly increased expression of acpA, acpB and capB but not atxA. 5′ end mapping of capB and acpA revealed atxA-regulated and atxA-independent transcriptional start sites for both genes. All atxA-regulated start sites were also CO2-regulated. A single atxA-independent start site was identified 5 ′ of acpB. However, RT-PCR analysis indicated that capD and acpB are co-transcribed. Thus, it is likely that atxA-mediated control of acpB expression occurs via transcriptional activation of the atxA-regulated start sites of capB. Finally, I examined the contribution of the B. anthracis capsule to virulence. The virulence of the parent strain, mutants deleted for the capsule biosynthesis genes ( capBCAD), and mutants missing the capsule regulator genes was compared using a mouse model for inhalation anthrax. The data indicate that in this model, capsule is essential for virulence. Mice survived infection with the noncapsulated capBCAD and acpA acpB mutants. These mutants initiated germination in the lung, but did not disseminate to the spleen. The acpA mutant had an LD50 value similar to the parent strain and was able to disseminate and cause lethal infection. Unexpectedly, the acpB mutant had a higher LD 50 and a reduced ability to disseminate. During in vitro culture, the acpB single mutant produces capsule and toxin similar to the parent strain. It is likely that acpB regulates the expression of downstream genes that contribute to the virulence of B. anthracis. ^
Resumo:
Tumor necrosis factor-related apoptosis-inducing ligand (Apo2L/TRAIL) is a member of the TNF superfamily of cytokines that can induce cell death through engagement of cognate death receptors. Unlike other death receptor ligands, it selectively kills tumor cells while sparing normal cells. Preclinical studies in non-human primates have generated much enthusiasm regarding its therapeutic potential. However, many human cancer cell lines exhibit significant resistance to TRAIL-induced apoptosis, and the molecular mechanisms underling this are controversial. Possible explanations are typically cell-type dependent, but include alterations of receptor expression, enhancement of pro-apoptotic intracellular signaling molecules, and reductions in anti-apoptotic proteins. We show here that the proteasome inhibitor bortezomib (Velcade, PS-341) produces synergistic apoptosis in both bladder and prostate cancer cell lines within 4-6 hours when co-treated with recombinant human TRAIL which is associated with accumulation of p21 and cdk1/2 inhibition. Our data suggest that bortezomib's mechanism of action involves a p21-dependent enhancement of caspase maturation. Furthermore, we found enhanced tumor cell death in in vivo models using athymic nude mice. This is associated with increases in caspase-8 and caspase-3 cleavage as well as significant reductions in microvessel density (MVD) and proliferation. Although TRAIL alone had less of an effect, its biological significance as a single agent requires further investigations. Toxicity studies reveal that the combination of bortezomib and rhTRAIL has fatal consequences that can be circumvented by altering treatment schedules. Based on our findings, we conclude that this strategy has significant therapeutic potential as an anti-cancer agent. ^
Resumo:
Microarray technology is a high-throughput method for genotyping and gene expression profiling. Limited sensitivity and specificity are one of the essential problems for this technology. Most of existing methods of microarray data analysis have an apparent limitation for they merely deal with the numerical part of microarray data and have made little use of gene sequence information. Because it's the gene sequences that precisely define the physical objects being measured by a microarray, it is natural to make the gene sequences an essential part of the data analysis. This dissertation focused on the development of free energy models to integrate sequence information in microarray data analysis. The models were used to characterize the mechanism of hybridization on microarrays and enhance sensitivity and specificity of microarray measurements. ^ Cross-hybridization is a major obstacle factor for the sensitivity and specificity of microarray measurements. In this dissertation, we evaluated the scope of cross-hybridization problem on short-oligo microarrays. The results showed that cross hybridization on arrays is mostly caused by oligo fragments with a run of 10 to 16 nucleotides complementary to the probes. Furthermore, a free-energy based model was proposed to quantify the amount of cross-hybridization signal on each probe. This model treats cross-hybridization as an integral effect of the interactions between a probe and various off-target oligo fragments. Using public spike-in datasets, the model showed high accuracy in predicting the cross-hybridization signals on those probes whose intended targets are absent in the sample. ^ Several prospective models were proposed to improve Positional Dependent Nearest-Neighbor (PDNN) model for better quantification of gene expression and cross-hybridization. ^ The problem addressed in this dissertation is fundamental to the microarray technology. We expect that this study will help us to understand the detailed mechanism that determines sensitivity and specificity on the microarrays. Consequently, this research will have a wide impact on how microarrays are designed and how the data are interpreted. ^
The determinants of improvements in health outcomes and of cost reduction in hospital inpatient care
Resumo:
This study aims to address two research questions. First, ‘Can we identify factors that are determinants both of improved health outcomes and of reduced costs for hospitalized patients with one of six common diagnoses?’ Second, ‘Can we identify other factors that are determinants of improved health outcomes for such hospitalized patients but which are not associated with costs?’ The Healthcare Cost and Utilization Project (HCUP) Nationwide Inpatient Sample (NIS) database from 2003 to 2006 was employed in this study. The total study sample consisted of hospitals which had at least 30 patients each year for the given diagnosis: 954 hospitals for acute myocardial infarction (AMI), 1552 hospitals for congestive heart failure (CHF), 1120 hospitals for stroke (STR), 1283 hospitals for gastrointestinal hemorrhage (GIH), 979 hospitals for hip fracture (HIP), and 1716 hospitals for pneumonia (PNE). This study used simultaneous equations models to investigate the determinants of improvement in health outcomes and of cost reduction in hospital inpatient care for these six common diagnoses. In addition, the study used instrumental variables and two-stage least squares random effect model for unbalanced panel data estimation. The study concluded that a few factors were determinants of high quality and low cost. Specifically, high specialty was the determinant of high quality and low costs for CHF patients; small hospital size was the determinant of high quality and low costs for AMI patients. Furthermore, CHF patients who were treated in Midwest, South, and West region hospitals had better health outcomes and lower hospital costs than patients who were treated in Northeast region hospitals. Gastrointestinal hemorrhage and pneumonia patients who were treated in South region hospitals also had better health outcomes and lower hospital costs than patients who were treated in Northeast region hospitals. This study found that six non-cost factors were related to health outcomes for a few diagnoses: hospital volume, percentage emergency room admissions for a given diagnosis, hospital competition, specialty, bed size, and hospital region.^