Classification of a diverse set of Tetrahymena pyriformis toxicity chemical compounds from molecular descriptors by statistical learning methods

Y. Xue, Hu Li, C. Y. Ung, C. W. Yap, Y. Z. Chen

Research output: Contribution to journalArticle

49 Citations (Scopus)

Abstract

Toxicity of various compounds has been measured in many studies by their toxic effects against Tetrahymena pyriformis. Efforts have also been made to use computational quantitative structure-activity relationship (QSAR) and statistical learning methods (SLMs) for predicting Tetrahymena pyriformis toxicity (TPT) at impressive accuracies. Because of the diversity of compounds and toxicity mechanisms, it is desirable to explore additional methods and to examine if these methods are applicable to more diverse sets of compounds. We tested several SLMs (logistic regression, C4.5 decision tree, k-nearest neighbor, probabilistic neural network, support vector machines) for their capability in predicting TPT by using 1129 compounds (841 TPT and 288 non-TPT agents) which are more diverse than those in other studies. A feature selection method was used for improving prediction performance and selecting molecular descriptors responsible for distinguishing TPT and non-TPT agents. The prediction accuracies are 86.9%∼94.2% for TPT and 71.2%∼87.5% for non-TPT agents based on 5-fold cross-validation studies, which are comparable to some of earlier studies despite the use of more diverse sets of compounds. The selected molecular descriptors are consistent with those used in other studies and experimental findings. These suggest that SLMs are useful for predicting TPT potential of diverse sets of compounds and for characterizing the molecular descriptors associated with TPT.

Original languageEnglish (US)
Pages (from-to)1030-1039
Number of pages10
JournalChemical Research in Toxicology
Volume19
Issue number8
DOIs
StatePublished - Aug 2006
Externally publishedYes

Fingerprint

Tetrahymena pyriformis
Chemical compounds
Toxicity
Learning
Decision Trees
Quantitative Structure-Activity Relationship
Validation Studies
Poisons
Logistic Models
Decision trees
Support vector machines
Logistics
Feature extraction

ASJC Scopus subject areas

  • Drug Discovery
  • Organic Chemistry
  • Chemistry(all)
  • Toxicology
  • Health, Toxicology and Mutagenesis

Cite this

Classification of a diverse set of Tetrahymena pyriformis toxicity chemical compounds from molecular descriptors by statistical learning methods. / Xue, Y.; Li, Hu; Ung, C. Y.; Yap, C. W.; Chen, Y. Z.

In: Chemical Research in Toxicology, Vol. 19, No. 8, 08.2006, p. 1030-1039.

Research output: Contribution to journalArticle

@article{59356f93edb64746a0e17a43e76494c5,
title = "Classification of a diverse set of Tetrahymena pyriformis toxicity chemical compounds from molecular descriptors by statistical learning methods",
abstract = "Toxicity of various compounds has been measured in many studies by their toxic effects against Tetrahymena pyriformis. Efforts have also been made to use computational quantitative structure-activity relationship (QSAR) and statistical learning methods (SLMs) for predicting Tetrahymena pyriformis toxicity (TPT) at impressive accuracies. Because of the diversity of compounds and toxicity mechanisms, it is desirable to explore additional methods and to examine if these methods are applicable to more diverse sets of compounds. We tested several SLMs (logistic regression, C4.5 decision tree, k-nearest neighbor, probabilistic neural network, support vector machines) for their capability in predicting TPT by using 1129 compounds (841 TPT and 288 non-TPT agents) which are more diverse than those in other studies. A feature selection method was used for improving prediction performance and selecting molecular descriptors responsible for distinguishing TPT and non-TPT agents. The prediction accuracies are 86.9{\%}∼94.2{\%} for TPT and 71.2{\%}∼87.5{\%} for non-TPT agents based on 5-fold cross-validation studies, which are comparable to some of earlier studies despite the use of more diverse sets of compounds. The selected molecular descriptors are consistent with those used in other studies and experimental findings. These suggest that SLMs are useful for predicting TPT potential of diverse sets of compounds and for characterizing the molecular descriptors associated with TPT.",
author = "Y. Xue and Hu Li and Ung, {C. Y.} and Yap, {C. W.} and Chen, {Y. Z.}",
year = "2006",
month = "8",
doi = "10.1021/tx0600550",
language = "English (US)",
volume = "19",
pages = "1030--1039",
journal = "Chemical Research in Toxicology",
issn = "0893-228X",
publisher = "American Chemical Society",
number = "8",

}

TY - JOUR

T1 - Classification of a diverse set of Tetrahymena pyriformis toxicity chemical compounds from molecular descriptors by statistical learning methods

AU - Xue, Y.

AU - Li, Hu

AU - Ung, C. Y.

AU - Yap, C. W.

AU - Chen, Y. Z.

PY - 2006/8

Y1 - 2006/8

N2 - Toxicity of various compounds has been measured in many studies by their toxic effects against Tetrahymena pyriformis. Efforts have also been made to use computational quantitative structure-activity relationship (QSAR) and statistical learning methods (SLMs) for predicting Tetrahymena pyriformis toxicity (TPT) at impressive accuracies. Because of the diversity of compounds and toxicity mechanisms, it is desirable to explore additional methods and to examine if these methods are applicable to more diverse sets of compounds. We tested several SLMs (logistic regression, C4.5 decision tree, k-nearest neighbor, probabilistic neural network, support vector machines) for their capability in predicting TPT by using 1129 compounds (841 TPT and 288 non-TPT agents) which are more diverse than those in other studies. A feature selection method was used for improving prediction performance and selecting molecular descriptors responsible for distinguishing TPT and non-TPT agents. The prediction accuracies are 86.9%∼94.2% for TPT and 71.2%∼87.5% for non-TPT agents based on 5-fold cross-validation studies, which are comparable to some of earlier studies despite the use of more diverse sets of compounds. The selected molecular descriptors are consistent with those used in other studies and experimental findings. These suggest that SLMs are useful for predicting TPT potential of diverse sets of compounds and for characterizing the molecular descriptors associated with TPT.

AB - Toxicity of various compounds has been measured in many studies by their toxic effects against Tetrahymena pyriformis. Efforts have also been made to use computational quantitative structure-activity relationship (QSAR) and statistical learning methods (SLMs) for predicting Tetrahymena pyriformis toxicity (TPT) at impressive accuracies. Because of the diversity of compounds and toxicity mechanisms, it is desirable to explore additional methods and to examine if these methods are applicable to more diverse sets of compounds. We tested several SLMs (logistic regression, C4.5 decision tree, k-nearest neighbor, probabilistic neural network, support vector machines) for their capability in predicting TPT by using 1129 compounds (841 TPT and 288 non-TPT agents) which are more diverse than those in other studies. A feature selection method was used for improving prediction performance and selecting molecular descriptors responsible for distinguishing TPT and non-TPT agents. The prediction accuracies are 86.9%∼94.2% for TPT and 71.2%∼87.5% for non-TPT agents based on 5-fold cross-validation studies, which are comparable to some of earlier studies despite the use of more diverse sets of compounds. The selected molecular descriptors are consistent with those used in other studies and experimental findings. These suggest that SLMs are useful for predicting TPT potential of diverse sets of compounds and for characterizing the molecular descriptors associated with TPT.

UR - http://www.scopus.com/inward/record.url?scp=33748702895&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33748702895&partnerID=8YFLogxK

U2 - 10.1021/tx0600550

DO - 10.1021/tx0600550

M3 - Article

VL - 19

SP - 1030

EP - 1039

JO - Chemical Research in Toxicology

JF - Chemical Research in Toxicology

SN - 0893-228X

IS - 8

ER -