4 resultados para Data selection

em Collection Of Biostatistics Research Archive


Relevância:

40.00% 40.00%

Publicador:

Resumo:

High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been on assessing univariate associations between gene expression with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recurrent event data are largely characterized by the rate function but smoothing techniques for estimating the rate function have never been rigorously developed or studied in statistical literature. This paper considers the moment and least squares methods for estimating the rate function from recurrent event data. With an independent censoring assumption on the recurrent event process, we study statistical properties of the proposed estimators and propose bootstrap procedures for the bandwidth selection and for the approximation of confidence intervals in the estimation of the occurrence rate function. It is identified that the moment method without resmoothing via a smaller bandwidth will produce curve with nicks occurring at the censoring times, whereas there is no such problem with the least squares method. Furthermore, the asymptotic variance of the least squares estimator is shown to be smaller under regularity conditions. However, in the implementation of the bootstrap procedures, the moment method is computationally more efficient than the least squares method because the former approach uses condensed bootstrap data. The performance of the proposed procedures is studied through Monte Carlo simulations and an epidemiological example on intravenous drug users.