10 resultados para ESTIMATOR
em DigitalCommons@The Texas Medical Center
Resumo:
Do siblings of centenarians tend to have longer life spans? To answer this question, life spans of 184 siblings for 42 centenarians have been evaluated. Two important questions have been addressed in analyzing the sibling data. First, a standard needs to be established, to which the life spans of 184 siblings are compared. In this report, an external reference population is constructed from the U.S. life tables. Its estimated mortality rates are treated as baseline hazards from which the relative mortality of the siblings are estimated. Second, the standard survival models which assume independent observations are invalid when correlation within family exists, underestimating the true variance. Methods that allow correlations are illustrated by three different methods. First, the cumulative relative excess mortality between siblings and their comparison group is calculated and used as an effective graphic tool, along with the Product Limit estimator of the survival function. The variance estimator of the cumulative relative excess mortality is adjusted for the potential within family correlation using Taylor linearization approach. Second, approaches that adjust for the inflated variance are examined. They are adjusted one-sample log-rank test using design effect originally proposed by Rao and Scott in the correlated binomial or Poisson distribution setting and the robust variance estimator derived from the log-likelihood function of a multiplicative model. Nether of these two approaches provide correlation estimate within families, but the comparison with the comparison with the standard remains valid under dependence. Last, using the frailty model concept, the multiplicative model, where the baseline hazards are known, is extended by adding a random frailty term that is based on the positive stable or the gamma distribution. Comparisons between the two frailty distributions are performed by simulation. Based on the results from various approaches, it is concluded that the siblings of centenarians had significant lower mortality rates as compared to their cohorts. The frailty models also indicate significant correlations between the life spans of the siblings. ^
Resumo:
Many public health agencies and researchers are interested in comparing hospital outcomes, for example, morbidity, mortality, and hospitalization across areas and hospitals. However, since there is variation of rates in clinical trials among hospitals because of several biases, we are interested in controlling for the bias and assessing real differences in clinical practices. In this study, we compared the variations between hospitals in rates of severe Intraventricular Haemorrhage (IVH) infant using Frequentist statistical approach vs. Bayesian hierarchical model through simulation study. The template data set for simulation study was included the number of severe IVH infants of 24 intensive care units in Australian and New Zealand Neonatal Network from 1995 to 1997 in severe IVH rate in preterm babies. We evaluated the rates of severe IVH for 24 hospitals with two hierarchical models in Bayesian approach comparing their performances with the shrunken rates in Frequentist method. Gamma-Poisson (BGP) and Beta-Binomial (BBB) were introduced into Bayesian model and the shrunken estimator of Gamma-Poisson (FGP) hierarchical model using maximum likelihood method were calculated as Frequentist approach. To simulate data, the total number of infants in each hospital was kept and we analyzed the simulated data for both Bayesian and Frequentist models with two true parameters for severe IVH rate. One was the observed rate and the other was the expected severe IVH rate by adjusting for five predictors variables for the template data. The bias in the rate of severe IVH infant estimated by both models showed that Bayesian models gave less variable estimates than Frequentist model. We also discussed and compared the results from three models to examine the variation in rate of severe IVH by 20th centile rates and avoidable number of severe IVH cases. ^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
The need for timely population data for health planning and Indicators of need has Increased the demand for population estimates. The data required to produce estimates is difficult to obtain and the process is time consuming. Estimation methods that require less effort and fewer data are needed. The structure preserving estimator (SPREE) is a promising technique not previously used to estimate county population characteristics. This study first uses traditional regression estimation techniques to produce estimates of county population totals. Then the structure preserving estimator, using the results produced in the first phase as constraints, is evaluated.^ Regression methods are among the most frequently used demographic methods for estimating populations. These methods use symptomatic indicators to predict population change. This research evaluates three regression methods to determine which will produce the best estimates based on the 1970 to 1980 indicators of population change. Strategies for stratifying data to improve the ability of the methods to predict change were tested. Difference-correlation using PMSA strata produced the equation which fit the data the best. Regression diagnostics were used to evaluate the residuals.^ The second phase of this study is to evaluate use of the structure preserving estimator in making estimates of population characteristics. The SPREE estimation approach uses existing data (the association structure) to establish the relationship between the variable of interest and the associated variable(s) at the county level. Marginals at the state level (the allocation structure) supply the current relationship between the variables. The full allocation structure model uses current estimates of county population totals to limit the magnitude of county estimates. The limited full allocation structure model has no constraints on county size. The 1970 county census age - gender population provides the association structure, the allocation structure is the 1980 state age - gender distribution.^ The full allocation model produces good estimates of the 1980 county age - gender populations. An unanticipated finding of this research is that the limited full allocation model produces estimates of county population totals that are superior to those produced by the regression methods. The full allocation model is used to produce estimates of 1986 county population characteristics. ^
Resumo:
In geographical epidemiology, maps of disease rates and disease risk provide a spatial perspective for researching disease etiology. For rare diseases or when the population base is small, the rate and risk estimates may be unstable. Empirical Bayesian (EB) methods have been used to spatially smooth the estimates by permitting an area estimate to "borrow strength" from its neighbors. Such EB methods include the use of a Gamma model, of a James-Stein estimator, and of a conditional autoregressive (CAR) process. A fully Bayesian analysis of the CAR process is proposed. One advantage of this fully Bayesian analysis is that it can be implemented simply by using repeated sampling from the posterior densities. Use of a Markov chain Monte Carlo technique such as Gibbs sampler was not necessary. Direct resampling from the posterior densities provides exact small sample inferences instead of the approximate asymptotic analyses of maximum likelihood methods (Clayton & Kaldor, 1987). Further, the proposed CAR model provides for covariates to be included in the model. A simulation demonstrates the effect of sample size on the fully Bayesian analysis of the CAR process. The methods are applied to lip cancer data from Scotland, and the results are compared. ^
Resumo:
One of the difficulties in the practical application of ridge regression is that, for a given data set, it is unknown whether a selected ridge estimator has smaller squared error than the least squares estimator. The concept of the improvement region is defined, and a technique is developed which obtains approximate confidence intervals for the value of ridge k which produces the maximum reduction in mean squared error. Two simulation experiments were conducted to investigate how accurate these approximate confidence intervals might be. ^
Resumo:
In regression analysis, covariate measurement error occurs in many applications. The error-prone covariates are often referred to as latent variables. In this proposed study, we extended the study of Chan et al. (2008) on recovering latent slope in a simple regression model to that in a multiple regression model. We presented an approach that applied the Monte Carlo method in the Bayesian framework to the parametric regression model with the measurement error in an explanatory variable. The proposed estimator applied the conditional expectation of latent slope given the observed outcome and surrogate variables in the multiple regression models. A simulation study was presented showing that the method produces estimator that is efficient in the multiple regression model, especially when the measurement error variance of surrogate variable is large.^
Resumo:
Background: Overall objectives of this dissertation are to examine the geographic variation and socio-demographic disparities (by age, race and gender) in the utilization and survival of newly FDA-approved chemotherapy agents (Oxaliplatin-containing regimens) as well as to determine the cost-effectiveness of Oxaliplatin in a large nationwide and population-based cohort of Medicare patients with resected stage-III colon cancer. Methods: A retrospective cohort of 7,654 Medicare patients was identified from the Surveillance, Epidemiology and End Results – Medicare linked database. Multiple logistic regression was performed to examine the relationship between receipt of Oxaliplatin-containing chemotherapy and geographic regions while adjusting for other patient characteristics. Cox proportional hazard model was used to estimate the effect of Oxaliplatin-containing chemotherapy on the survival variation across regions using 2004-2005 data. Propensity score adjustments were also made to control for potential bias related to non-random allocation of the treatment group. We used Kaplan-Meier sample average estimator to calculate the cost of disease after cancer-specific surgery to death, loss-to follow-up or censorship. Results: Only 51% of the stage-III patients received adjuvant chemotherapy within three to six months of colon-cancer specific surgery. Patients in the rural regions were approximately 30% less likely to receive Oxaliplatin chemotherapy than those residing in a big metro region (OR=0.69, p=0.033). The hazard ratio for patients residing in metro region was comparable to those residing in big metro region (HR: 1.05, 95% CI: 0.49-2.28). Patients who received Oxalipaltin chemotherapy were 33% less likely to die than those received 5-FU only chemotherapy (adjusted HR=0.67, 95% CI: 0.41-1.11). KMSA-adjusted mean payments were almost 2.5 times higher in the Oxaliplatin-containing group compared to 5-FU only group ($45,378 versus $17,856). When compared to no chemotherapy group, ICER of 5-FU based regimen was $12,767 per LYG, and ICER of Oxaliplatin-chemotherapy was $60,863 per LYG. Oxaliplatin was found economically dominated by 5-FU only chemotherapy in this study population. Conclusion: Chemotherapy use varies across geographic regions. We also observed considerable survival differences across geographic regions; the difference remained even after adjusting for socio-demographic characteristics. The cost-effectiveness of Oxaliplatin in Medicare patients may be over-estimated in the clinical trials. Our study found 5-FU only chemotherapy cost-effective in adjuvant settings in patients with stage-III colon cancer.^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
The problem of analyzing data with updated measurements in the time-dependent proportional hazards model arises frequently in practice. One available option is to reduce the number of intervals (or updated measurements) to be included in the Cox regression model. We empirically investigated the bias of the estimator of the time-dependent covariate while varying the effect of failure rate, sample size, true values of the parameters and the number of intervals. We also evaluated how often a time-dependent covariate needs to be collected and assessed the effect of sample size and failure rate on the power of testing a time-dependent effect.^ A time-dependent proportional hazards model with two binary covariates was considered. The time axis was partitioned into k intervals. The baseline hazard was assumed to be 1 so that the failure times were exponentially distributed in the ith interval. A type II censoring model was adopted to characterize the failure rate. The factors of interest were sample size (500, 1000), type II censoring with failure rates of 0.05, 0.10, and 0.20, and three values for each of the non-time-dependent and time-dependent covariates (1/4,1/2,3/4).^ The mean of the bias of the estimator of the coefficient of the time-dependent covariate decreased as sample size and number of intervals increased whereas the mean of the bias increased as failure rate and true values of the covariates increased. The mean of the bias of the estimator of the coefficient was smallest when all of the updated measurements were used in the model compared with two models that used selected measurements of the time-dependent covariate. For the model that included all the measurements, the coverage rates of the estimator of the coefficient of the time-dependent covariate was in most cases 90% or more except when the failure rate was high (0.20). The power associated with testing a time-dependent effect was highest when all of the measurements of the time-dependent covariate were used. An example from the Systolic Hypertension in the Elderly Program Cooperative Research Group is presented. ^