A Combinatorial Approach to the Variable Selection in Multiple Linear Regression: Analysis of Selwood et al Data set -A Case Study.
Data(s) |
25/02/2008
25/02/2008
2003
|
---|---|
Resumo |
A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632-0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time. |
Formato |
360884 bytes application/pdf |
Identificador |
QSAR & Combinatorial Chemistry Science (2003), 22, 538 |
Idioma(s) |
en |
Relação |
CDRI Communication Number 6225 |
Palavras-Chave | #Regression analysis #variable selection #combinatorial approach #antimycin A1 analogues #antifilarial |
Tipo |
Article |