A Combinatorial Approach to the Variable Selection in Multiple Linear Regression: Analysis of Selwood et al Data set -A Case Study.


Autoria(s): Prabhakar, Yenamandra S
Data(s)

25/02/2008

25/02/2008

2003

Resumo

A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632-0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.

Formato

360884 bytes

application/pdf

Identificador

QSAR & Combinatorial Chemistry Science (2003), 22, 538

http://hdl.handle.net/123456789/81

Idioma(s)

en

Relação

CDRI Communication Number 6225

Palavras-Chave #Regression analysis #variable selection #combinatorial approach #antimycin A1 analogues #antifilarial
Tipo

Article