8 resultados para large sample distributions
em Collection Of Biostatistics Research Archive
Resumo:
Outcome-dependent, two-phase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and influence functions for the semiparametric regression models studied by Lawless, Kalbfleisch, and Wild (1999) under two-phase sampling designs. We show that the maximum likelihood estimators for both the parametric and nonparametric parts of the model are asymptotically normal and efficient. The efficient influence function for the parametric part aggress with the more general information bound calculations of Robins, Hsieh, and Newey (1995). By verifying the conditions of Murphy and Van der Vaart (2000) for a least favorable parametric submodel, we provide asymptotic justification for statistical inference based on profile likelihood.
Resumo:
Suppose that we are interested in establishing simple, but reliable rules for predicting future t-year survivors via censored regression models. In this article, we present inference procedures for evaluating such binary classification rules based on various prediction precision measures quantified by the overall misclassification rate, sensitivity and specificity, and positive and negative predictive values. Specifically, under various working models we derive consistent estimators for the above measures via substitution and cross validation estimation procedures. Furthermore, we provide large sample approximations to the distributions of these nonsmooth estimators without assuming that the working model is correctly specified. Confidence intervals, for example, for the difference of the precision measures between two competing rules can then be constructed. All the proposals are illustrated with two real examples and their finite sample properties are evaluated via a simulation study.
Resumo:
In this paper, we study panel count data with informative observation times. We assume nonparametric and semiparametric proportional rate models for the underlying recurrent event process, where the form of the baseline rate function is left unspecified and a subject-specific frailty variable inflates or deflates the rate function multiplicatively. The proposed models allow the recurrent event processes and observation times to be correlated through their connections with the unobserved frailty; moreover, the distributions of both the frailty variable and observation times are considered as nuisance parameters. The baseline rate function and the regression parameters are estimated by maximizing a conditional likelihood function of observed event counts and solving estimation equations. Large sample properties of the proposed estimators are studied. Numerical studies demonstrate that the proposed estimation procedures perform well for moderate sample sizes. An application to a bladder tumor study is presented to illustrate the use of the proposed methods.
Resumo:
Simulation-based assessment is a popular and frequently necessary approach to evaluation of statistical procedures. Sometimes overlooked is the ability to take advantage of underlying mathematical relations and we focus on this aspect. We show how to take advantage of large-sample theory when conducting a simulation using the analysis of genomic data as a motivating example. The approach uses convergence results to provide an approximation to smaller-sample results, results that are available only by simulation. We consider evaluating and comparing a variety of ranking-based methods for identifying the most highly associated SNPs in a genome-wide association study, derive integral equation representations of the pre-posterior distribution of percentiles produced by three ranking methods, and provide examples comparing performance. These results are of interest in their own right and set the framework for a more extensive set of comparisons.
Resumo:
Studies of chronic life-threatening diseases often involve both mortality and morbidity. In observational studies, the data may also be subject to administrative left truncation and right censoring. Since mortality and morbidity may be correlated and mortality may censor morbidity, the Lynden-Bell estimator for left truncated and right censored data may be biased for estimating the marginal survival function of the non-terminal event. We propose a semiparametric estimator for this survival function based on a joint model for the two time-to-event variables, which utilizes the gamma frailty specification in the region of the observable data. Firstly, we develop a novel estimator for the gamma frailty parameter under left truncation. Using this estimator, we then derive a closed form estimator for the marginal distribution of the non-terminal event. The large sample properties of the estimators are established via asymptotic theory. The methodology performs well with moderate sample sizes, both in simulations and in an analysis of data from a diabetes registry.
Resumo:
A number of authors have studies the mixture survival model to analyze survival data with nonnegligible cure fractions. A key assumption made by these authors is the independence between the survival time and the censoring time. To our knowledge, no one has studies the mixture cure model in the presence of dependent censoring. To account for such dependence, we propose a more general cure model which allows for dependent censoring. In particular, we derive the cure models from the perspective of competing risks and model the dependence between the censoring time and the survival time using a class of Archimedean copula models. Within this framework, we consider the parameter estimation, the cure detection, and the two-sample comparison of latency distribution in the presence of dependent censoring when a proportion of patients is deemed cured. Large sample results using the martingale theory are obtained. We applied the proposed methodologies to the SEER prostate cancer data.
Resumo:
In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation. A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being more efficient computationally than other Bayesian approaches. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. Conclusions based on a real dataset of cancer cases in Taiwan are similar albeit less conclusive with respect to comparing the approaches. The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models.
Resumo:
Use of microarray technology often leads to high-dimensional and low- sample size data settings. Over the past several years, a variety of novel approaches have been proposed for variable selection in this context. However, only a small number of these have been adapted for time-to-event data where censoring is present. Among standard variable selection methods shown both to have good predictive accuracy and to be computationally efficient is the elastic net penalization approach. In this paper, adaptation of the elastic net approach is presented for variable selection both under the Cox proportional hazards model and under an accelerated failure time (AFT) model. Assessment of the two methods is conducted through simulation studies and through analysis of microarray data obtained from a set of patients with diffuse large B-cell lymphoma where time to survival is of interest. The approaches are shown to match or exceed the predictive performance of a Cox-based and an AFT-based variable selection method. The methods are moreover shown to be much more computationally efficient than their respective Cox- and AFT- based counterparts.