Mutagenesis vol. 19 no. 5 © UK Environmental Mutagen Society 2004; all rights reserved.
Three new consensus QSAR models for the prediction of Ames genotoxicity
ChemSilico LLC, 48 Baldwin Street, Tewksbury, MA 01876, USA, 1Department of Chemistry, Eastern Nazarene College, Quincy, MA 02170, USA, 2Department of Medicinal Chemistry, School of Pharmacy, Virginia Commonwealth University, Richmond, VA 23298, USA, 3Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA and 4Center for Toxicoinformatics, Division of Biometry and Risk Assessment, National Center for Toxicological Research, Jefferson, AK 72079, USA
Three QSAR methods, artificial neural net (ANN), k-nearest neighbors (kNN), and Decision Forest (DF), were applied to 3363 diverse compounds tested for their Ames genotoxicity. The ratio of mutagens to non-mutagens was 60/40 for this dataset. This group of compounds includes >300 therapeutic drugs. All models were developed using the same initial set of 148 topological indices: molecular connectivity
indices and electrotopological state indices (atom-type, bond-type and group-type E-state), as well as binary indicators. While previous studies have found logP to be a determining factor in genotoxicity, it was not found to be important by any modeling method employed in this study. The three models yielded an average training/test concordance value of 88%, with a low percentage of false positives and false negatives. External validation testing on 400 compounds not used for QSAR model development gave an average concordance of 82%. This value increased to 92% upon removal of less reliable outcomes, as determined by a reliability criterion used within each model. The ANN model showed the best performance in predicting drug compounds, yielding 97% concordance (34/35 drugs) after the removal of less reliable predictions. The appreciable commonality found among the top 10 ranked descriptors from each model is of particular interest because of the diversity in the learning algorithms and descriptor selection techniques employed in this study. Forty percent of the most important descriptors in any one model are found in one or two other models. Fourteen of the most important descriptors relate directly to known toxicophores involved in potent genotoxic responses in Salmonella typhimurium. A comparison of the validation results with those of MULTICASE and DEREK indicated that the new models presented in this work perform substantially better than the former models in predicting genotoxicity of therapeutic drugs. Substantially higher specificity was achieved with these new models as compared with MULTICASE or DEREK with comparable sensitivities among all models.
Conflict of interest statement
J.R.Votano, M.Parham, L.H.Hall and L.B.Kier are partners in ChemSilico LLC. One of the models described in this paper was developed by ChemSilico LLC.
5 To whom correspondence should be addressed. Tel: +1 978 501 0633; Fax: +1 781 275 5197; Email: jvotano{at}chemsilico.com