Permutation Tests for Classification


Autoria(s): Mukherjee, Sayan; Golland, Polina; Panchenko, Dmitry
Data(s)

08/10/2004

08/10/2004

28/08/2003

Resumo

We introduce and explore an approach to estimating statistical significance of classification accuracy, which is particularly useful in scientific applications of machine learning where high dimensionality of the data and the small number of training examples render most standard convergence bounds too loose to yield a meaningful guarantee of the generalization ability of the classifier. Instead, we estimate statistical significance of the observed classification accuracy, or the likelihood of observing such accuracy by chance due to spurious correlations of the high-dimensional data patterns with the class labels in the given training set. We adopt permutation testing, a non-parametric technique previously developed in classical statistics for hypothesis testing in the generative setting (i.e., comparing two probability distributions). We demonstrate the method on real examples from neuroimaging studies and DNA microarray analysis and suggest a theoretical analysis of the procedure that relates the asymptotic behavior of the test to the existing convergence bounds.

Formato

22 p.

1135156 bytes

662639 bytes

application/postscript

application/pdf

Identificador

AIM-2003-019

http://hdl.handle.net/1721.1/6723

Idioma(s)

en_US

Relação

AIM-2003-019

Palavras-Chave #AI #Classification #Permutation testing #Statistical significance.