A Comparative Analysis of Predictive Learning Algorithms on High-Dimensional Microarray Cancer Data


Autoria(s): Bill, Jo; Fokoue, Ernest
Data(s)

06/04/2015

06/04/2015

2014

Resumo

This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.

Identificador

Serdica Journal of Computing, Vol. 8, No 2, (2014), 137p-168p

1312-6555

http://hdl.handle.net/10525/2437

Idioma(s)

en

Publicador

Institute of Mathematics and Informatics Bulgarian Academy of Sciences

Palavras-Chave #HDLSS #Machine Learning Algorithm #Pattern Recognition #Classification #Prediction #Regularization #Discriminant Analysis #Support Vector Machine #Kernels #Cross Validation #Microarray Cancer Data
Tipo

Article