Prediction of genotoxicity of chemical compounds by statistical learning methods

Hu Li, C. Y. Ung, C. W. Yap, Y. Xue, Z. R. Li, Z. W. Cao, Y. Z. Chen

Research output: Contribution to journalArticle

67 Citations (Scopus)

Abstract

Various toxicological profiles, such as genotoxic potential, need to be studied in drug discovery processes and submitted to the drug regulatory authorities for drug safety evaluation. As part of the effort for developing low cost and efficient adverse drug reaction testing tools, several statistical learning methods have been used for developing genotoxicity prediction systems with an accuracy of up to 73.8% for genotoxic (GT+) and 92.8% for nongenotoxic (GT-) agents. These systems have been developed and tested by using less than 400 known GT+ and GT-agents, which is significantly less in number and diversity than the 860 GT+ and GT- agents known at present. There is a need to examine if a similar level of accuracy can be achieved for the more diverse set of molecules and to evaluate other statistical learning methods not yet applied to genotoxicity prediction. This work is intended for testing several statistical learning methods by using 860 GT+ and GT- agents, which include support vector machines (SVM), probabilistic neural network (PNN), k-nearest neighbor (A-NN), and C4.5 decision tree (DT). A feature selection method, recursive feature elimination, is used for selecting molecular descriptors relevant to genotoxicity study. The overall accuracies of SVM, k-NN, and PNN are comparable to and those of DT lower than the results from earlier studies, with SVM giving the highest accuracies of 77.8% for GT+ and 92.7% for GT- agents. Our study suggests that statistical learning methods, particularly SVM, k-NN, and PNN, are useful for facilitating the prediction of genotoxic potential of a diverse set of molecules.

Original languageEnglish (US)
Pages (from-to)1071-1080
Number of pages10
JournalChemical Research in Toxicology
Volume18
Issue number6
DOIs
StatePublished - Jun 2005
Externally publishedYes

Fingerprint

Chemical compounds
Support vector machines
Learning
Decision trees
Neural networks
Decision Trees
Pharmaceutical Preparations
Molecules
Testing
Drug Evaluation
Feature extraction
Drug Discovery
Drug-Related Side Effects and Adverse Reactions
Toxicology
Safety
Costs and Cost Analysis
Support Vector Machine
Costs

ASJC Scopus subject areas

  • Drug Discovery
  • Organic Chemistry
  • Chemistry(all)
  • Toxicology
  • Health, Toxicology and Mutagenesis

Cite this

Li, H., Ung, C. Y., Yap, C. W., Xue, Y., Li, Z. R., Cao, Z. W., & Chen, Y. Z. (2005). Prediction of genotoxicity of chemical compounds by statistical learning methods. Chemical Research in Toxicology, 18(6), 1071-1080. https://doi.org/10.1021/tx049652h

Prediction of genotoxicity of chemical compounds by statistical learning methods. / Li, Hu; Ung, C. Y.; Yap, C. W.; Xue, Y.; Li, Z. R.; Cao, Z. W.; Chen, Y. Z.

In: Chemical Research in Toxicology, Vol. 18, No. 6, 06.2005, p. 1071-1080.

Research output: Contribution to journalArticle

Li, Hu ; Ung, C. Y. ; Yap, C. W. ; Xue, Y. ; Li, Z. R. ; Cao, Z. W. ; Chen, Y. Z. / Prediction of genotoxicity of chemical compounds by statistical learning methods. In: Chemical Research in Toxicology. 2005 ; Vol. 18, No. 6. pp. 1071-1080.
@article{4a4efe587b7f4231809dd8686d2800cd,
title = "Prediction of genotoxicity of chemical compounds by statistical learning methods",
abstract = "Various toxicological profiles, such as genotoxic potential, need to be studied in drug discovery processes and submitted to the drug regulatory authorities for drug safety evaluation. As part of the effort for developing low cost and efficient adverse drug reaction testing tools, several statistical learning methods have been used for developing genotoxicity prediction systems with an accuracy of up to 73.8{\%} for genotoxic (GT+) and 92.8{\%} for nongenotoxic (GT-) agents. These systems have been developed and tested by using less than 400 known GT+ and GT-agents, which is significantly less in number and diversity than the 860 GT+ and GT- agents known at present. There is a need to examine if a similar level of accuracy can be achieved for the more diverse set of molecules and to evaluate other statistical learning methods not yet applied to genotoxicity prediction. This work is intended for testing several statistical learning methods by using 860 GT+ and GT- agents, which include support vector machines (SVM), probabilistic neural network (PNN), k-nearest neighbor (A-NN), and C4.5 decision tree (DT). A feature selection method, recursive feature elimination, is used for selecting molecular descriptors relevant to genotoxicity study. The overall accuracies of SVM, k-NN, and PNN are comparable to and those of DT lower than the results from earlier studies, with SVM giving the highest accuracies of 77.8{\%} for GT+ and 92.7{\%} for GT- agents. Our study suggests that statistical learning methods, particularly SVM, k-NN, and PNN, are useful for facilitating the prediction of genotoxic potential of a diverse set of molecules.",
author = "Hu Li and Ung, {C. Y.} and Yap, {C. W.} and Y. Xue and Li, {Z. R.} and Cao, {Z. W.} and Chen, {Y. Z.}",
year = "2005",
month = "6",
doi = "10.1021/tx049652h",
language = "English (US)",
volume = "18",
pages = "1071--1080",
journal = "Chemical Research in Toxicology",
issn = "0893-228X",
publisher = "American Chemical Society",
number = "6",

}

TY - JOUR

T1 - Prediction of genotoxicity of chemical compounds by statistical learning methods

AU - Li, Hu

AU - Ung, C. Y.

AU - Yap, C. W.

AU - Xue, Y.

AU - Li, Z. R.

AU - Cao, Z. W.

AU - Chen, Y. Z.

PY - 2005/6

Y1 - 2005/6

N2 - Various toxicological profiles, such as genotoxic potential, need to be studied in drug discovery processes and submitted to the drug regulatory authorities for drug safety evaluation. As part of the effort for developing low cost and efficient adverse drug reaction testing tools, several statistical learning methods have been used for developing genotoxicity prediction systems with an accuracy of up to 73.8% for genotoxic (GT+) and 92.8% for nongenotoxic (GT-) agents. These systems have been developed and tested by using less than 400 known GT+ and GT-agents, which is significantly less in number and diversity than the 860 GT+ and GT- agents known at present. There is a need to examine if a similar level of accuracy can be achieved for the more diverse set of molecules and to evaluate other statistical learning methods not yet applied to genotoxicity prediction. This work is intended for testing several statistical learning methods by using 860 GT+ and GT- agents, which include support vector machines (SVM), probabilistic neural network (PNN), k-nearest neighbor (A-NN), and C4.5 decision tree (DT). A feature selection method, recursive feature elimination, is used for selecting molecular descriptors relevant to genotoxicity study. The overall accuracies of SVM, k-NN, and PNN are comparable to and those of DT lower than the results from earlier studies, with SVM giving the highest accuracies of 77.8% for GT+ and 92.7% for GT- agents. Our study suggests that statistical learning methods, particularly SVM, k-NN, and PNN, are useful for facilitating the prediction of genotoxic potential of a diverse set of molecules.

AB - Various toxicological profiles, such as genotoxic potential, need to be studied in drug discovery processes and submitted to the drug regulatory authorities for drug safety evaluation. As part of the effort for developing low cost and efficient adverse drug reaction testing tools, several statistical learning methods have been used for developing genotoxicity prediction systems with an accuracy of up to 73.8% for genotoxic (GT+) and 92.8% for nongenotoxic (GT-) agents. These systems have been developed and tested by using less than 400 known GT+ and GT-agents, which is significantly less in number and diversity than the 860 GT+ and GT- agents known at present. There is a need to examine if a similar level of accuracy can be achieved for the more diverse set of molecules and to evaluate other statistical learning methods not yet applied to genotoxicity prediction. This work is intended for testing several statistical learning methods by using 860 GT+ and GT- agents, which include support vector machines (SVM), probabilistic neural network (PNN), k-nearest neighbor (A-NN), and C4.5 decision tree (DT). A feature selection method, recursive feature elimination, is used for selecting molecular descriptors relevant to genotoxicity study. The overall accuracies of SVM, k-NN, and PNN are comparable to and those of DT lower than the results from earlier studies, with SVM giving the highest accuracies of 77.8% for GT+ and 92.7% for GT- agents. Our study suggests that statistical learning methods, particularly SVM, k-NN, and PNN, are useful for facilitating the prediction of genotoxic potential of a diverse set of molecules.

UR - http://www.scopus.com/inward/record.url?scp=21144435586&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=21144435586&partnerID=8YFLogxK

U2 - 10.1021/tx049652h

DO - 10.1021/tx049652h

M3 - Article

C2 - 15962942

AN - SCOPUS:21144435586

VL - 18

SP - 1071

EP - 1080

JO - Chemical Research in Toxicology

JF - Chemical Research in Toxicology

SN - 0893-228X

IS - 6

ER -