6 resultados para Gauss-Seidel algorithm
em Collection Of Biostatistics Research Archive
Resumo:
Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases. This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM to subsetted versions of the data. Additional gains in efficiency are achieved for Poisson models, commonly used in disease mapping problems, because of their special collapsibility property which allows data reduction through summaries. Convergence of the proposed iterative procedure is guaranteed for canonical link functions. The strategy is applied to investigate the relationship between ischemic heart disease, socioeconomic status and age/gender category in New South Wales, Australia, based on outcome data consisting of approximately 33 million records. A simulation study demonstrates the algorithm's reliability in analyzing a data set with 12 million records for a (non-collapsible) logistic regression model.
Resumo:
We derive a new class of iterative schemes for accelerating the convergence of the EM algorithm, by exploiting the connection between fixed point iterations and extrapolation methods. First, we present a general formulation of one-step iterative schemes, which are obtained by cycling with the extrapolation methods. We, then square the one-step schemes to obtain the new class of methods, which we call SQUAREM. Squaring a one-step iterative scheme is simply applying it twice within each cycle of the extrapolation method. Here we focus on the first order or rank-one extrapolation methods for two reasons, (1) simplicity, and (2) computational efficiency. In particular, we study two first order extrapolation methods, the reduced rank extrapolation (RRE1) and minimal polynomial extrapolation (MPE1). The convergence of the new schemes, both one-step and squared, is non-monotonic with respect to the residual norm. The first order one-step and SQUAREM schemes are linearly convergent, like the EM algorithm but they have a faster rate of convergence. We demonstrate, through five different examples, the effectiveness of the first order SQUAREM schemes, SqRRE1 and SqMPE1, in accelerating the EM algorithm. The SQUAREM schemes are also shown to be vastly superior to their one-step counterparts, RRE1 and MPE1, in terms of computational efficiency. The proposed extrapolation schemes can fail due to the numerical problems of stagnation and near breakdown. We have developed a new hybrid iterative scheme that combines the RRE1 and MPE1 schemes in such a manner that it overcomes both stagnation and near breakdown. The squared first order hybrid scheme, SqHyb1, emerges as the iterative scheme of choice based on our numerical experiments. It combines the fast convergence of the SqMPE1, while avoiding near breakdowns, with the stability of SqRRE1, while avoiding stagnations. The SQUAREM methods can be incorporated very easily into an existing EM algorithm. They only require the basic EM step for their implementation and do not require any other auxiliary quantities such as the complete data log likelihood, and its gradient or hessian. They are an attractive option in problems with a very large number of parameters, and in problems where the statistical model is complex, the EM algorithm is slow and each EM step is computationally demanding.
Resumo:
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm. Results: We present a hybrid approach to obtain the $p$-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data. Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy'' package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the $p$-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.