12 resultados para Classical orthogonal polynomials of a discrete variable
em DigitalCommons@The Texas Medical Center
Resumo:
Mixture modeling is commonly used to model categorical latent variables that represent subpopulations in which population membership is unknown but can be inferred from the data. In relatively recent years, the potential of finite mixture models has been applied in time-to-event data. However, the commonly used survival mixture model assumes that the effects of the covariates involved in failure times differ across latent classes, but the covariate distribution is homogeneous. The aim of this dissertation is to develop a method to examine time-to-event data in the presence of unobserved heterogeneity under a framework of mixture modeling. A joint model is developed to incorporate the latent survival trajectory along with the observed information for the joint analysis of a time-to-event variable, its discrete and continuous covariates, and a latent class variable. It is assumed that the effects of covariates on survival times and the distribution of covariates vary across different latent classes. The unobservable survival trajectories are identified through estimating the probability that a subject belongs to a particular class based on observed information. We applied this method to a Hodgkin lymphoma study with long-term follow-up and observed four distinct latent classes in terms of long-term survival and distributions of prognostic factors. Our results from simulation studies and from the Hodgkin lymphoma study demonstrated the superiority of our joint model compared with the conventional survival model. This flexible inference method provides more accurate estimation and accommodates unobservable heterogeneity among individuals while taking involved interactions between covariates into consideration.^
Resumo:
The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^
Resumo:
Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^
Resumo:
The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^
Resumo:
Congenital Adrenal Hyperplasia (CAH), due to 21-Hydroxylase deficiency, has an estimated incidence of 1:15,000 births and can result in death, salt-wasting crisis or impaired growth. It has been proposed that early diagnosis and treatment of infants detected from newborn screening for CAH will decrease the incidence of mortality and morbidity in the affected population. The Texas Department of Health (TDH) began mandatory screening for CAH in June, 1989 and Texas is one of fourteen states to provide neonatal screening for the disorder.^ The purpose of this study was to describe the cost and effect of screening for CAH in Texas during 1994 and to compare cases first detected by screen and first detected clinically between January 1, 1990 and December 31, 1994. This study used a longitudinal descriptive research design. The data was secondary and previously collected by the Texas Department of Health. Along with the descriptive study, an economic analysis was done. The cost of the program was defined, measured and valued for four phases of screening: specimen collection, specimen testing, follow-up and diagnostic evaluation.^ There were 103 infants with Classical CAH diagnosed during the study and 71 of the cases had the more serious Salt-Wasting form of the disease. Of the infants diagnosed with Classical CAH, 60% of the cases were first detected by screen and 40% were first detected because of clinical findings before the screening results were returned. The base case cost of adding newborn screening to an existing program (excluding the cost of specimen collection) was $357,989 for 100,000 infants. The cost per case of Classical CAH diagnosed, based on the number of infants first detected by screen in 1994, was \$126,892. There were 42 infants diagnosed with the more benign Nonclassical form of the disease. When these cases were included in the total, the cost per infant to diagnose Congenital Adrenal/Hyperplasia was $87,848. ^
Resumo:
T-cell lymphomas from AKR mice were studied to determine their potential as a model of T-cell differentiation. Homogeneous tumor cell lines have been used as model to study normal lymphocyte subpopulations, including differentiation lineages, functional properties, and the inducibility to maturation. The underlying concept is that each lymphoid tumor represents a monoclonal neoplastic proliferation of a discrete lymphoid subpopulation arrested at a particular differentiation stage.^ Individual tumors were analyzed to determine the extent of intertumor heterogeneity, and to determine whether lymphomas represented different thymocyte subsets, by determining the cell-surface antigenic phenotype, PNA-binding capacity, and terminal deoxynucleotidyl transferase (TdT) activity. Splenic and thymic tumor cells were compared to determine if the particular lymphoid microenvironment influenced T-cell marker expression. Several of the lymphomas were passaged in syngeneic hosts to verify the original tumor phenotype and to assess the stability of the cell surface and TdT phenotype after transplantation.^ Lymphomas were adapted to in vitro culture to determine whether the T-cell phenotype was maintained in the absence of the host microenvironment. Clonal progeny were analyzed and compared with each other and with parent cell lines to determine the extent of intratumor heterogeneity in this lymphoma system. Parent and cloned cell lines were passaged in vivo to determine whether alterations in surface phenotype occurred after transplantation.^ Our investigation has verified that most spontaneous AKR lymphomas phenotypically resemble known T-cell subsets, including both immature and mature thymic subpopulations. The in vitro lines, however, expressed a highly unstable phenotype in culture that included loss of Ly-1 and Ly-2 antigen expression. After transplantation in vivo, the in vitro lines exhibited alterations in phenotype, including re-expression of Ly antigen on some lymphomas. The inducibility of T-cell antigen markers on tumor cell lines passaged in vivo suggests that the in vitro lines may serve as a possible model system to study the molecular events involved in gene expression in the T-cell system. ^
Resumo:
Public preferences for policy are formed in a little-understood process that is not adequately described by traditional economic theory of choice. In this paper I suggest that U.S. aggregate support for health reform can be modeled as tradeoffs among a small number of behavioral values and the stage of policy development. The theory underlying the model is based on Samuelson, et al.'s (1986) work and Wilke's (1991) elaboration of it as the Greed/Efficiency/Fairness (GEF) hypothesis of motivation in the management of resource dilemmas, and behavioral economics informed by Kahneman and Thaler's prospect theory. ^ The model developed in this paper employs ordered probit econometric techniques applied to data derived from U.S. polls taken from 1990 to mid-2003 that measured support for health reform proposals. Outcome data are four-tiered Likert counts; independent variables are dummies representing the presence or absence of operationalizations of each behavioral variable, along with an integer representing policy process stage. Marginal effects of each independent variable predict how support levels change on triggering that variable. Model estimation results indicate a vanishingly small likelihood that all coefficients are zero and all variables have signs expected from model theory. ^ Three hypotheses were tested: support will drain from health reform policy as it becomes increasingly well-articulated and approaches enactment; reforms appealing to fairness through universal health coverage will enjoy a higher degree of support than those targeted more narrowly; health reforms calling for government operation of the health finance system will achieve lower support than those that do not. Model results support the first and last hypotheses. Contrary to expectations, universal health care proposals did not provide incremental support beyond those targeted to “deserving” populations—children, elderly, working families. In addition, loss of autonomy (e.g. restrictions on choice of care giver) is found to be the “third rail” of health reform with significantly-reduced support. When applied to a hypothetical health reform in which an employer-mandated Medical Savings Account policy is the centerpiece, the model predicts support that may be insufficient to enactment. These results indicate that the method developed in the paper may prove valuable to health policy designers. ^
Resumo:
Gastrointestinal stromal tumors (GIST) represent 80% of sarcoma arising from the GI tract. The inciting event in tumor progression is mutation of the kit or, rarely, platelet derived growth factor receptor-α (PDGFR) gene. These mutations encode ligand independent, constitutively active proteins: Kit or PDGFR. ^ These tumors are notoriously chemo and radio resistant. Historically, patients with advanced disease realized a median overall survival of 9 months. However, with modern management of GIST with imatinib mesylate (Novartis), a small molecule inhibitor of the Kit, PDGFR, and Abl tyrosine kinases, patients now realize a median overall survival greater than 30 months. However, almost half of patients present with surgically resectable GIST and the utility of imatinib in this context has not been prospectively studied. Also, therapeutic benefit of imatinib is variable from patient to patient and alternative targeted therapy is emerging as potential alternatives to imatinib. Thus, elucidating prognostic factors for patients with GIST in the imatinib-era is crucial to providing optimal care to each particular patient. Moreover, the exact mechanism of action of imatinib in GIST is not fully understood. Therefore, physicians find difficulty in accurately predicting which patient will benefit from imatinib, how to assess response to therapy, and the time at which to assess response. ^ I have hypothesized that imatinib is tolerable and clinically beneficial in the context of surgery, VEGF expression and kit non-exon 11 genotypes portend poor survival on imatinib therapy, and imatinib's mechanism of action is in part due to anti-vascular effects and inhibition of the Kit/SCF signaling axis of tumor-associated endothelial cells. ^ Results herein demonstrate that imatinib is safe and increases the duration of disease-free survival when combined with surgery. Radiographic and molecular (namely, apoptosis) changes occur within 3 days of imatinib initiation. I illustrate that non-exon 11 mutant genotypes and VEGF are poor prognostic factors for patients treated with imatinib. These findings may allow for patient stratification to emerging therapies rather than imatinib. I show that imatinib has anti-vascular effects via inducing tumor endothelial cell apoptosis perhaps by abrogation of the Kit/SCF signaling axis. ^
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
Interaction effect is an important scientific interest for many areas of research. Common approach for investigating the interaction effect of two continuous covariates on a response variable is through a cross-product term in multiple linear regression. In epidemiological studies, the two-way analysis of variance (ANOVA) type of method has also been utilized to examine the interaction effect by replacing the continuous covariates with their discretized levels. However, the implications of model assumptions of either approach have not been examined and the statistical validation has only focused on the general method, not specifically for the interaction effect.^ In this dissertation, we investigated the validity of both approaches based on the mathematical assumptions for non-skewed data. We showed that linear regression may not be an appropriate model when the interaction effect exists because it implies a highly skewed distribution for the response variable. We also showed that the normality and constant variance assumptions required by ANOVA are not satisfied in the model where the continuous covariates are replaced with their discretized levels. Therefore, naïve application of ANOVA method may lead to an incorrect conclusion. ^ Given the problems identified above, we proposed a novel method modifying from the traditional ANOVA approach to rigorously evaluate the interaction effect. The analytical expression of the interaction effect was derived based on the conditional distribution of the response variable given the discretized continuous covariates. A testing procedure that combines the p-values from each level of the discretized covariates was developed to test the overall significance of the interaction effect. According to the simulation study, the proposed method is more powerful then the least squares regression and the ANOVA method in detecting the interaction effect when data comes from a trivariate normal distribution. The proposed method was applied to a dataset from the National Institute of Neurological Disorders and Stroke (NINDS) tissue plasminogen activator (t-PA) stroke trial, and baseline age-by-weight interaction effect was found significant in predicting the change from baseline in NIHSS at Month-3 among patients received t-PA therapy.^
Resumo:
In Part One, the foundations of Bayesian inference are reviewed, and the technicalities of the Bayesian method are illustrated. Part Two applies the Bayesian meta-analysis program, the Confidence Profile Method (CPM), to clinical trial data and evaluates the merits of using Bayesian meta-analysis for overviews of clinical trials.^ The Bayesian method of meta-analysis produced similar results to the classical results because of the large sample size, along with the input of a non-preferential prior probability distribution. These results were anticipated through explanations in Part One of the mechanics of the Bayesian approach. ^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.