8 resultados para Length biased models
em DigitalCommons@The Texas Medical Center
Resumo:
Prevalent sampling is an efficient and focused approach to the study of the natural history of disease. Right-censored time-to-event data observed from prospective prevalent cohort studies are often subject to left-truncated sampling. Left-truncated samples are not randomly selected from the population of interest and have a selection bias. Extensive studies have focused on estimating the unbiased distribution given left-truncated samples. However, in many applications, the exact date of disease onset was not observed. For example, in an HIV infection study, the exact HIV infection time is not observable. However, it is known that the HIV infection date occurred between two observable dates. Meeting these challenges motivated our study. We propose parametric models to estimate the unbiased distribution of left-truncated, right-censored time-to-event data with uncertain onset times. We first consider data from a length-biased sampling, a specific case in left-truncated samplings. Then we extend the proposed method to general left-truncated sampling. With a parametric model, we construct the full likelihood, given a biased sample with unobservable onset of disease. The parameters are estimated through the maximization of the constructed likelihood by adjusting the selection bias and unobservable exact onset. Simulations are conducted to evaluate the finite sample performance of the proposed methods. We apply the proposed method to an HIV infection study, estimating the unbiased survival function and covariance coefficients. ^
Resumo:
Evidence for an RNA gain-of-function toxicity has now been provided for an increasing number of human pathologies. Myotonic dystrophies (DM) belong to a class of RNA-dominant diseases that result from RNA repeat expansion toxicity. Specifically, DM of type 1 (DM1), is caused by an expansion of CUG repeats in the 3'UTR of the DMPK protein kinase mRNA, while DM of type 2 (DM2) is linked to an expansion of CCUG repeats in an intron of the ZNF9 transcript (ZNF9 encodes a zinc finger protein). In both pathologies the mutant RNA forms nuclear foci. The mechanisms that underlie the RNA pathogenicity seem to be rather complex and not yet completely understood. Here, we describe Drosophila models that might help unravelling the molecular mechanisms of DM1-associated CUG expansion toxicity. We generated transgenic flies that express inducible repeats of different type (CUG or CAG) and length (16, 240, 480 repeats) and then analyzed transgene localization, RNA expression and toxicity as assessed by induced lethality and eye neurodegeneration. The only line that expressed a toxic RNA has a (CTG)(240) insertion. Moreover our analysis shows that its level of expression cannot account for its toxicity. In this line, (CTG)(240.4), the expansion inserted in the first intron of CG9650, a zinc finger protein encoding gene. Interestingly, CG9650 and (CUG)(240.4) expansion RNAs were found in the same nuclear foci. In conclusion, we suggest that the insertion context is the primary determinant for expansion toxicity in Drosophila models. This finding should contribute to the still open debate on the role of the expansions per se in Drosophila and in human pathogenesis of RNA-dominant diseases.
Resumo:
Models of DNA sequence evolution and methods for estimating evolutionary distances are needed for studying the rate and pattern of molecular evolution and for inferring the evolutionary relationships of organisms or genes. In this dissertation, several new models and methods are developed.^ The rate variation among nucleotide sites: To obtain unbiased estimates of evolutionary distances, the rate heterogeneity among nucleotide sites of a gene should be considered. Commonly, it is assumed that the substitution rate varies among sites according to a gamma distribution (gamma model) or, more generally, an invariant+gamma model which includes some invariable sites. A maximum likelihood (ML) approach was developed for estimating the shape parameter of the gamma distribution $(\alpha)$ and/or the proportion of invariable sites $(\theta).$ Computer simulation showed that (1) under the gamma model, $\alpha$ can be well estimated from 3 or 4 sequences if the sequence length is long; and (2) the distance estimate is unbiased and robust against violations of the assumptions of the invariant+gamma model.^ However, this ML method requires a huge amount of computational time and is useful only for less than 6 sequences. Therefore, I developed a fast method for estimating $\alpha,$ which is easy to implement and requires no knowledge of tree. A computer program was developed for estimating $\alpha$ and evolutionary distances, which can handle the number of sequences as large as 30.^ Evolutionary distances under the stationary, time-reversible (SR) model: The SR model is a general model of nucleotide substitution, which assumes (i) stationary nucleotide frequencies and (ii) time-reversibility. It can be extended to SRV model which allows rate variation among sites. I developed a method for estimating the distance under the SR or SRV model, as well as the variance-covariance matrix of distances. Computer simulation showed that the SR method is better than a simpler method when the sequence length $L>1,000$ bp and is robust against deviations from time-reversibility. As expected, when the rate varies among sites, the SRV method is much better than the SR method.^ The evolutionary distances under nonstationary nucleotide frequencies: The statistical properties of the paralinear and LogDet distances under nonstationary nucleotide frequencies were studied. First, I developed formulas for correcting the estimation biases of the paralinear and LogDet distances. The performances of these formulas and the formulas for sampling variances were examined by computer simulation. Second, I developed a method for estimating the variance-covariance matrix of the paralinear distance, so that statistical tests of phylogenies can be conducted when the nucleotide frequencies are nonstationary. Third, a new method for testing the molecular clock hypothesis was developed in the nonstationary case. ^
Resumo:
Coronary artery bypass graft (CABG) surgery is among the most common operations performed in the United States and accounts for more resources expended in cardiovascular medicine than any other single procedure. CABG surgery patients initially recover in the Cardiovascular Intensive Care Unit (CVICU). The post-procedure CVICU length of stay (LOS) goal is two days or less. A longer ICU LOS is associated with a prolonged hospital LOS, poor health outcomes, greater use of limited resources, and increased medical costs. ^ Research has shown that experienced clinicians can predict LOS no better than chance. Current CABG surgery LOS risk models differ greatly in generalizability and ease of use in the clinical setting. A predictive model that identified modifiable pre- and intra-operative risk factors for CVICU LOS greater than two days could have major public health implications as modification of these identified factors could decrease CVICU LOS and potentially minimize morbidity and mortality, optimize use of limited health care resources, and decrease medical costs. ^ The primary aim of this study was to identify modifiable pre-and intra-operative predictors of CVICU LOS greater than two days for CABG surgery patients with cardiopulmonary bypass (CPB). A secondary aim was to build a probability equation for CVICU LOS greater than two days. Data were extracted from 416 medical records of CABG surgery patients with CPB, 50 to 80 years of age, recovered in the CVICU of a large teaching, referral hospital in southeastern Texas, during the calendar year 2004 and the first quarter of 2005. Exclusion criteria included Diagnosis Related Group (DRG) 106, CABG surgery without CPB, CABG surgery with other procedures, and operative deaths. The data were analyzed using multivariate logistic regression for an alpha=0.05, power=0.80, and correlation=0.26. ^ This study found age, history of peripheral arterial disease, and total operative time equal to and greater than four hours to be independent predictors of CVICU LOS greater than two days. The probability of CVICU LOS greater than two days can be calculated by the following equation: -2.872941 +.0323081 (age in years) + .8177223 (history of peripheral arterial disease) + .70379 (operative time). ^
Resumo:
Although the area under the receiver operating characteristic (AUC) is the most popular measure of the performance of prediction models, it has limitations, especially when it is used to evaluate the added discrimination of a new biomarker in the model. Pencina et al. (2008) proposed two indices, the net reclassification improvement (NRI) and integrated discrimination improvement (IDI), to supplement the improvement in the AUC (IAUC). Their NRI and IDI are based on binary outcomes in case-control settings, which do not involve time-to-event outcome. However, many disease outcomes are time-dependent and the onset time can be censored. Measuring discrimination potential of a prognostic marker without considering time to event can lead to biased estimates. In this dissertation, we have extended the NRI and IDI to survival analysis settings and derived the corresponding sample estimators and asymptotic tests. Simulation studies were conducted to compare the performance of the time-dependent NRI and IDI with Pencina’s NRI and IDI. For illustration, we have applied the proposed method to a breast cancer study.^ Key words: Prognostic model, Discrimination, Time-dependent NRI and IDI ^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
Studies have suggested that acculturation is related to diabetes prevalence and risk factors among immigrant groups in the United States (U.S.), however scant data are available to investigate this relationship among Asian Americans and Asian American subgroups. The objective of this cross-sectional study was to examine the association between length of stay in the U.S. and type 2 diabetes prevalence and its risk factors among Chinese Americans in Houston, Texas. Data were obtained from the 2004-2005 Asian-American Health Needs Assessment in Houston, Texas (N=409 Chinese Americans) for secondary analysis in this study. Diabetes prevalence and risk factors (overweight/obesity and access to medical care) were based on self-report. Descriptive statistics summarized demographic characteristics, diabetes prevalence, and reasons for not seeing a doctor. Logistic regression, using an incremental modeling approach, was used to measure the association between length of stay and diabetes prevalence and related risk factors, while adjusting for the potential confounding factors of age, gender, education level, and income level. Although the prevalence of type 2 diabetes was highest among those living in the U.S. for more than 20 years, there was no significant association between length of stay in the U.S. and diabetes prevalence among these Chinese Americans after adjustment for confounding factors. No association was found between length of stay in the U.S. and overweight/obese status among this population either, after adjusting for confounding factors, too. On the other hand, a longer length of stay was significantly associated with increased health insurance coverage in both unadjusted and adjusted models. The findings of this study suggest that length of stay in the U.S. alone may not be an indicator for diabetes risk among Chinese Americans. Future research should consider alternative models to measure acculturation (e.g., models that reflect acculturation as a multi-dimensional, not uni-dimensional process), which may more accurately depict its effect on diabetes prevalence and related risk factors.^
Resumo:
Better morbidity and mortality outcomes associated with increased hospital procedural volume have been demonstrated across a number of different medical procedures. Existence of such a volume-outcome relationship is posited to lead to increased specialization of care, such that patients requiring specific procedures are funneled to physicians and hospitals that achieve a minimum volume of such procedures each year. In this study, the 2009 Nationwide Inpatient Sample is used to examine the relationship between hospital volume and patient outcome among patients undergoing procedures related to malignant brain cancer. Multiple regression models were used to examine the impact of hospital volume on length of inpatient stay and cost of inpatient stay; logistic regression was used to examine the impact of hospital volume on morbidity. Hospital volume was found to be a significant predictor of both length of stay and cost of stay. Hospital volume was associated with a lower length of stay, but was also associated with increased costs. Hospital volume was not found to be a statistically significant predictor of morbidity, though less than three percent of this sample died while in the hospital. Volume is indeed a significant predictor of outcome for procedures related to brain malignancies, though further research regarding the cost of such procedures is recommended.^