Prediction of estrogen receptor agonists and characterization of associated molecular descriptors by statistical learning methods

Hu Li, C. Y. Ung, C. W. Yap, Y. Xue, Z. R. Li, Y. Z. Chen

Research output: Contribution to journalArticle

36 Citations (Scopus)

Abstract

Specific estrogen receptor (ER) agonists have been used for hormone replacement therapy, contraception, osteoporosis prevention, and prostate cancer treatment. Some ER agonists and partial-agonists induce cancer and endocrine function disruption. Methods for predicting ER agonists are useful for facilitating drug discovery and chemical safety evaluation. Structure-activity relationships and rule-based decision forest models have been derived for predicting ER binders at impressive accuracies of 87.1-97.6% for ER binders and 80.2-96.0% for ER non-binders. However, these are not designed for identifying ER agonists and they were developed from a subset of known ER binders. This work explored several statistical learning methods (support vector machines, k-nearest neighbor, probabilistic neural network and C4.5 decision tree) for predicting ER agonists from comprehensive set of known ER agonists and other compounds. The corresponding prediction systems were developed and tested by using 243 ER agonists and 463 ER non-agonists, respectively, which are significantly larger in number and structural diversity than those in previous studies. A feature selection method was used for selecting molecular descriptors responsible for distinguishing ER agonists from non-agonists, some of which are consistent with those used in other studies and the findings from X-ray crystallography data. The prediction accuracies of these methods are comparable to those of earlier studies despite the use of significantly more diverse range of compounds. SVM gives the best accuracy of 88.9% for ER agonists and 98.1% for non-agonists. Our study suggests that statistical learning methods such as SVM are potentially useful for facilitating the prediction of ER agonists and for characterizing the molecular descriptors associated with ER agonists.

Original languageEnglish (US)
Pages (from-to)313-323
Number of pages11
JournalJournal of Molecular Graphics and Modelling
Volume25
Issue number3
DOIs
StatePublished - Nov 2006
Externally publishedYes

Fingerprint

estrogens
learning
Estrogens
predictions
Estrogen Receptors
Binders
Oncology
X ray crystallography
cancer
Decision trees
osteoporosis
Support vector machines
Feature extraction
hormones
Hormones

Keywords

  • Classification
  • Estrogen receptor (ER)
  • Estrogen receptor agonists
  • Molecular descriptors
  • Statistical learning methods (SLMs)
  • Support vector machine (SVM)

ASJC Scopus subject areas

  • Physical and Theoretical Chemistry
  • Spectroscopy
  • Atomic and Molecular Physics, and Optics

Cite this

Prediction of estrogen receptor agonists and characterization of associated molecular descriptors by statistical learning methods. / Li, Hu; Ung, C. Y.; Yap, C. W.; Xue, Y.; Li, Z. R.; Chen, Y. Z.

In: Journal of Molecular Graphics and Modelling, Vol. 25, No. 3, 11.2006, p. 313-323.

Research output: Contribution to journalArticle

@article{7f544e0fa2bd42d9b1315369c3e8928e,
title = "Prediction of estrogen receptor agonists and characterization of associated molecular descriptors by statistical learning methods",
abstract = "Specific estrogen receptor (ER) agonists have been used for hormone replacement therapy, contraception, osteoporosis prevention, and prostate cancer treatment. Some ER agonists and partial-agonists induce cancer and endocrine function disruption. Methods for predicting ER agonists are useful for facilitating drug discovery and chemical safety evaluation. Structure-activity relationships and rule-based decision forest models have been derived for predicting ER binders at impressive accuracies of 87.1-97.6{\%} for ER binders and 80.2-96.0{\%} for ER non-binders. However, these are not designed for identifying ER agonists and they were developed from a subset of known ER binders. This work explored several statistical learning methods (support vector machines, k-nearest neighbor, probabilistic neural network and C4.5 decision tree) for predicting ER agonists from comprehensive set of known ER agonists and other compounds. The corresponding prediction systems were developed and tested by using 243 ER agonists and 463 ER non-agonists, respectively, which are significantly larger in number and structural diversity than those in previous studies. A feature selection method was used for selecting molecular descriptors responsible for distinguishing ER agonists from non-agonists, some of which are consistent with those used in other studies and the findings from X-ray crystallography data. The prediction accuracies of these methods are comparable to those of earlier studies despite the use of significantly more diverse range of compounds. SVM gives the best accuracy of 88.9{\%} for ER agonists and 98.1{\%} for non-agonists. Our study suggests that statistical learning methods such as SVM are potentially useful for facilitating the prediction of ER agonists and for characterizing the molecular descriptors associated with ER agonists.",
keywords = "Classification, Estrogen receptor (ER), Estrogen receptor agonists, Molecular descriptors, Statistical learning methods (SLMs), Support vector machine (SVM)",
author = "Hu Li and Ung, {C. Y.} and Yap, {C. W.} and Y. Xue and Li, {Z. R.} and Chen, {Y. Z.}",
year = "2006",
month = "11",
doi = "10.1016/j.jmgm.2006.01.007",
language = "English (US)",
volume = "25",
pages = "313--323",
journal = "Journal of Molecular Graphics and Modelling",
issn = "1093-3263",
publisher = "Elsevier Inc.",
number = "3",

}

TY - JOUR

T1 - Prediction of estrogen receptor agonists and characterization of associated molecular descriptors by statistical learning methods

AU - Li, Hu

AU - Ung, C. Y.

AU - Yap, C. W.

AU - Xue, Y.

AU - Li, Z. R.

AU - Chen, Y. Z.

PY - 2006/11

Y1 - 2006/11

N2 - Specific estrogen receptor (ER) agonists have been used for hormone replacement therapy, contraception, osteoporosis prevention, and prostate cancer treatment. Some ER agonists and partial-agonists induce cancer and endocrine function disruption. Methods for predicting ER agonists are useful for facilitating drug discovery and chemical safety evaluation. Structure-activity relationships and rule-based decision forest models have been derived for predicting ER binders at impressive accuracies of 87.1-97.6% for ER binders and 80.2-96.0% for ER non-binders. However, these are not designed for identifying ER agonists and they were developed from a subset of known ER binders. This work explored several statistical learning methods (support vector machines, k-nearest neighbor, probabilistic neural network and C4.5 decision tree) for predicting ER agonists from comprehensive set of known ER agonists and other compounds. The corresponding prediction systems were developed and tested by using 243 ER agonists and 463 ER non-agonists, respectively, which are significantly larger in number and structural diversity than those in previous studies. A feature selection method was used for selecting molecular descriptors responsible for distinguishing ER agonists from non-agonists, some of which are consistent with those used in other studies and the findings from X-ray crystallography data. The prediction accuracies of these methods are comparable to those of earlier studies despite the use of significantly more diverse range of compounds. SVM gives the best accuracy of 88.9% for ER agonists and 98.1% for non-agonists. Our study suggests that statistical learning methods such as SVM are potentially useful for facilitating the prediction of ER agonists and for characterizing the molecular descriptors associated with ER agonists.

AB - Specific estrogen receptor (ER) agonists have been used for hormone replacement therapy, contraception, osteoporosis prevention, and prostate cancer treatment. Some ER agonists and partial-agonists induce cancer and endocrine function disruption. Methods for predicting ER agonists are useful for facilitating drug discovery and chemical safety evaluation. Structure-activity relationships and rule-based decision forest models have been derived for predicting ER binders at impressive accuracies of 87.1-97.6% for ER binders and 80.2-96.0% for ER non-binders. However, these are not designed for identifying ER agonists and they were developed from a subset of known ER binders. This work explored several statistical learning methods (support vector machines, k-nearest neighbor, probabilistic neural network and C4.5 decision tree) for predicting ER agonists from comprehensive set of known ER agonists and other compounds. The corresponding prediction systems were developed and tested by using 243 ER agonists and 463 ER non-agonists, respectively, which are significantly larger in number and structural diversity than those in previous studies. A feature selection method was used for selecting molecular descriptors responsible for distinguishing ER agonists from non-agonists, some of which are consistent with those used in other studies and the findings from X-ray crystallography data. The prediction accuracies of these methods are comparable to those of earlier studies despite the use of significantly more diverse range of compounds. SVM gives the best accuracy of 88.9% for ER agonists and 98.1% for non-agonists. Our study suggests that statistical learning methods such as SVM are potentially useful for facilitating the prediction of ER agonists and for characterizing the molecular descriptors associated with ER agonists.

KW - Classification

KW - Estrogen receptor (ER)

KW - Estrogen receptor agonists

KW - Molecular descriptors

KW - Statistical learning methods (SLMs)

KW - Support vector machine (SVM)

UR - http://www.scopus.com/inward/record.url?scp=33750982700&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750982700&partnerID=8YFLogxK

U2 - 10.1016/j.jmgm.2006.01.007

DO - 10.1016/j.jmgm.2006.01.007

M3 - Article

C2 - 16497524

AN - SCOPUS:33750982700

VL - 25

SP - 313

EP - 323

JO - Journal of Molecular Graphics and Modelling

JF - Journal of Molecular Graphics and Modelling

SN - 1093-3263

IS - 3

ER -