Classification Using Generalized Partial Least Squares


Autoria(s): Ding, Beiying; Gentleman, Robert
Data(s)

01/05/2004

Resumo

The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.

Formato

application/pdf

Identificador

http://biostats.bepress.com/bioconductor/paper5

http://biostats.bepress.com/cgi/viewcontent.cgi?article=1004&context=bioconductor

Publicador

Collection of Biostatistics Research Archive

Fonte

Bioconductor Project Working Papers

Palavras-Chave #Cross-validation #Firth's procedure #gene expression #Iteratively Reweighted Partial Least Squares #(Quasi)separation #Two-stage PLS #Bioinformatics #Computational Biology #Genetics #Microarrays #Multivariate Analysis #Statistical Models
Tipo

text