HOME     SCHEDULE     AUTHOR INDEX     SUBJECT INDEX         

PARENT SESSION

1F - QSAR
Poster Hall
8:30 AM - Tuesday, 29 April 2003
Chair: Schüürmann, G.1, 1
Co-chair: Verhaar, H.J.M.2, Cronin, M.3, 2 3

(TUP/43) Selection of Training and Test Sets for Aquatic Toxicity QSARs.

Netzeva, Tatiana1, Schultz, Terry2, Cronin, Mark1, 1 Liverpool John Moores University, Liverpool, Merseyside, England2 The University of Tennessee, Knoxville, TN, USA

ABSTRACT- A number of techniques for Design of Optimal Experiments (DOE) have been described. They have been applied successfully in the validation of QSAR analyses for pharmacological and toxicological endpoints. The aim of this study was to apply the distance-based optimality method to select a small, but diverse, subset of chemicals from a large database of toxicity values. The subsets were then utilised to develop QSAR models, which were subsequently validated by an external test set. A database comprising the acute toxicity of more than 450 aliphatic compounds to Tetrahymena pyriformis was the source of compounds for this study. Principal Component Analysis (PCA) was utilised to select the 120 most diverse chemicals in terms of physico-chemical descriptor space. A two dimensional (8-descriptor) model was developed using Partial Least Squares (PLS) on these 120 compounds. The significance of the model was assessed by a random permutation test. External validation was performed by splitting the compounds set into training and test sets using the ratios 1:3, 1:2 and 2:1 respectively. It was concluded that the model derived on 25% of existing data (i.e. 120 compounds) selected to cover maximum descriptor space, can be used confidently for prediction purposes, thus reducing significantly the number of required experiments. However, for successful validation even of such multivariate model it is suggested that there should be a minimum of 10 observations for each variable in a QSAR. [This work was supported in part by the European Union IMAGETOX Research Training Network (HPRN-CT- 1999-00015)].

Key words: compound selection, QSAR, validation, molecular diversity