2 resultados para HDLSS
Resumo:
This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.
Resumo:
This thesis is concerned with change point analysis for time series, i.e. with detection of structural breaks in time-ordered, random data. This long-standing research field regained popularity over the last few years and is still undergoing, as statistical analysis in general, a transformation to high-dimensional problems. We focus on the fundamental »change in the mean« problem and provide extensions of the classical non-parametric Darling-Erdős-type cumulative sum (CUSUM) testing and estimation theory within highdimensional Hilbert space settings. In the first part we contribute to (long run) principal component based testing methods for Hilbert space valued time series under a rather broad (abrupt, epidemic, gradual, multiple) change setting and under dependence. For the dependence structure we consider either traditional m-dependence assumptions or more recently developed m-approximability conditions which cover, e.g., MA, AR and ARCH models. We derive Gumbel and Brownian bridge type approximations of the distribution of the test statistic under the null hypothesis of no change and consistency conditions under the alternative. A new formulation of the test statistic using projections on subspaces allows us to simplify the standard proof techniques and to weaken common assumptions on the covariance structure. Furthermore, we propose to adjust the principal components by an implicit estimation of a (possible) change direction. This approach adds flexibility to projection based methods, weakens typical technical conditions and provides better consistency properties under the alternative. In the second part we contribute to estimation methods for common changes in the means of panels of Hilbert space valued time series. We analyze weighted CUSUM estimates within a recently proposed »high-dimensional low sample size (HDLSS)« framework, where the sample size is fixed but the number of panels increases. We derive sharp conditions on »pointwise asymptotic accuracy« or »uniform asymptotic accuracy« of those estimates in terms of the weighting function. Particularly, we prove that a covariance-based correction of Darling-Erdős-type CUSUM estimates is required to guarantee uniform asymptotic accuracy under moderate dependence conditions within panels and that these conditions are fulfilled, e.g., by any MA(1) time series. As a counterexample we show that for AR(1) time series, close to the non-stationary case, the dependence is too strong and uniform asymptotic accuracy cannot be ensured. Finally, we conduct simulations to demonstrate that our results are practically applicable and that our methodological suggestions are advantageous.