HOME     SCHEDULE     AUTHOR INDEX     SUBJECT INDEX         

PARENT SESSION

1F - QSAR
Poster Hall
8:30 AM - Tuesday, 29 April 2003
Chair: Schüürmann, G.1, 1
Co-chair: Verhaar, H.J.M.2, Cronin, M.3, 2 3

(TUP/49) Chemometric methodologies for the modelling of heterogeneous chemical toxicity: dataset representativity as the absolute essential.

Gramatica, Paola1, Consonni, Viviana2, Pavan, Manuela2, Pilutti, Pamela1, Papa, Ester1, 1 QSAR and Enviromental Chem istry Research Unit-DBSF-University of Insubria, Varese, Lombardia, Italy2 Milano Chemometrics and QSAR Research Group-DISAT-Milano Bicocca University, Milano, Lombardia, Italy

ABSTRACT- Chemometric methodologies for the modelling of heterogeneous chemical toxicity: dataset representativity as the absolute essential. Gramatica P.1, Consonni V.2, Pavan M.2, Pilutti P1., Papa E.1. 1QSAR Research Unit, Insubria University, Varese, (Italy);2Milano Chemometrics & QSAR Research Group, Milano - Bicocca University, Milano, (Italy). The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 126 heterogeneous chemicals of high concern as environmental pollutants has been studied for toxicity on Scenedesmus vacuolatus. Several chemometric techniques were applied on the experimental toxicity data with the aim of developing a "universal" QSAR able to describe and predict the toxicity of structurally heterogeneous and dissimilarly acting chemicals. The chemical structures of the compounds were described with several types of theoretical molecular descriptors (software DRAGON). The Genetic Algorithm approach was used as the variable subset selection method applied to OLS regression. In order to verify the predictive capability of the developed QSAR models a training set selection was performed by Experimental Design. OLS models have been developed on 70 chemicals selected as the training set for the two parameters "a" (correlated with EC50 values) and "b" (steepness) of the Weibull model. Tree model regression and Counter Propagation Artificial Neural Networks (CP-ANN) approaches were also used to verify the utility of non-linear techniques. Several classification methodologies have been applied on the categorised toxicity data: Tree model classification, K-Nearest neighbors (k-NN) and CP-ANN. All the used methodologies showed a not-satisfactory performance in validation, demonstrating that a "universal" QSAR model is not possible when chemicals are significantly different in structure and mode of action. This highlights the essential need for data set representativity for the successful application of QSAR. Moreover QSAR models on the limited data sets of the more similar compounds, in both structure and mode of action, show high predictive performance.

Key words: chemometrics , QSAR , representativity, toxicity