5 resultados para binary hyperplane
em Collection Of Biostatistics Research Archive
Resumo:
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm. Results: We present a hybrid approach to obtain the $p$-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data. Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy'' package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the $p$-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.
Resumo:
The positive and negative predictive value are standard measures used to quantify the predictive accuracy of binary biomarkers when the outcome being predicted is also binary. When the biomarkers are instead being used to predict a failure time outcome, there is no standard way of quantifying predictive accuracy. We propose a natural extension of the traditional predictive values to accommodate censored survival data. We discuss not only quantifying predictive accuracy using these extended predictive values, but also rigorously comparing the accuracy of two biomarkers in terms of their predictive values. Using a marginal regression framework, we describe how to estimate differences in predictive accuracy and how to test whether the observed difference is statistically significant.
Resumo:
We consider inference in randomized studies, in which repeatedly measured outcomes may be informatively missing due to drop out. In this setting, it is well known that full data estimands are not identified unless unverified assumptions are imposed. We assume a non-future dependence model for the drop-out mechanism and posit an exponential tilt model that links non-identifiable and identifiable distributions. This model is indexed by non-identified parameters, which are assumed to have an informative prior distribution, elicited from subject-matter experts. Under this model, full data estimands are shown to be expressed as functionals of the distribution of the observed data. To avoid the curse of dimensionality, we model the distribution of the observed data using a Bayesian shrinkage model. In a simulation study, we compare our approach to a fully parametric and a fully saturated model for the distribution of the observed data. Our methodology is motivated and applied to data from the Breast Cancer Prevention Trial.
Resumo:
Many seemingly disparate approaches for marginal modeling have been developed in recent years. We demonstrate that many current approaches for marginal modeling of correlated binary outcomes produce likelihoods that are equivalent to the proposed copula-based models herein. These general copula models of underlying latent threshold random variables yield likelihood based models for marginal fixed effects estimation and interpretation in the analysis of correlated binary data. Moreover, we propose a nomenclature and set of model relationships that substantially elucidates the complex area of marginalized models for binary data. A diverse collection of didactic mathematical and numerical examples are given to illustrate concepts.