Predicting small ligand binding sites in proteins using backbone structure

Andrew J. Bordner

doi:10.1093/bioinformatics/btn543

Predicting small ligand binding sites in proteins using backbone structure

Andrew J. Bordner

Biochemistry and Molecular Biology

Research output: Contribution to journal › Article › peer-review

38 Scopus citations

Abstract

Motivation: Specific non-covalent binding of metal ions and ligands, such as nucleotides and cofactors, is essential for the function of many proteins. Computational methods are useful for predicting the location of such binding sites when experimental information is lacking. Methods that use structural information, when available, are particularly promising since they can potentially identify non-contiguous binding motifs that cannot be found using only the amino acid sequence. Furthermore, a prediction method that can utilize low-resolution models is advantageous because high-resolution structures are available for only a relatively small fraction of proteins. Results: SitePredict is amachine learning-based method for predicting binding sites in protein structures for specific metal ions or small molecules. The method uses Random Forest classifiers trained on diverse residue-based site properties including spatial clustering of residue types and evolutionary conservation. SitePredict was tested by cross-validation on a set of known binding sites for six different metal ions and five different small molecules in a non-redundant set of protein-ligand complex structures. The prediction performance was good for all ligands considered, as reflected by AUC values of at least 0.8. Furthermore, a more realistic test on unbound structures showed only a slight decrease in the accuracy. The properties that contribute the most to the prediction accuracy of each ligand were also examined. Finally, examples of predicted binding sites in homology models and uncharacterized proteins are discussed.

Original language	English (US)
Pages (from-to)	2865-2871
Number of pages	7
Journal	Bioinformatics
Volume	24
Issue number	24
DOIs	https://doi.org/10.1093/bioinformatics/btn543
State	Published - Dec 2008

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btn543

Cite this

@article{3d3b5e62b0b14b47a8e7549411408e96,

title = "Predicting small ligand binding sites in proteins using backbone structure",

abstract = "Motivation: Specific non-covalent binding of metal ions and ligands, such as nucleotides and cofactors, is essential for the function of many proteins. Computational methods are useful for predicting the location of such binding sites when experimental information is lacking. Methods that use structural information, when available, are particularly promising since they can potentially identify non-contiguous binding motifs that cannot be found using only the amino acid sequence. Furthermore, a prediction method that can utilize low-resolution models is advantageous because high-resolution structures are available for only a relatively small fraction of proteins. Results: SitePredict is amachine learning-based method for predicting binding sites in protein structures for specific metal ions or small molecules. The method uses Random Forest classifiers trained on diverse residue-based site properties including spatial clustering of residue types and evolutionary conservation. SitePredict was tested by cross-validation on a set of known binding sites for six different metal ions and five different small molecules in a non-redundant set of protein-ligand complex structures. The prediction performance was good for all ligands considered, as reflected by AUC values of at least 0.8. Furthermore, a more realistic test on unbound structures showed only a slight decrease in the accuracy. The properties that contribute the most to the prediction accuracy of each ligand were also examined. Finally, examples of predicted binding sites in homology models and uncharacterized proteins are discussed.",

author = "Bordner, {Andrew J.}",

note = "Funding Information: Funding: Mayo Clinic and a Biopilot project from the DOE Office of Advanced Scientific Computing Research; ERKP558 {\textquoteleft}An integrated knowledge base for the Shewanella Federation{\textquoteright}from the DOE Office of Biological and Environmental Research.",

year = "2008",

month = dec,

doi = "10.1093/bioinformatics/btn543",

language = "English (US)",

volume = "24",

pages = "2865--2871",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "24",

}

TY - JOUR

T1 - Predicting small ligand binding sites in proteins using backbone structure

AU - Bordner, Andrew J.

N1 - Funding Information: Funding: Mayo Clinic and a Biopilot project from the DOE Office of Advanced Scientific Computing Research; ERKP558 ‘An integrated knowledge base for the Shewanella Federation’from the DOE Office of Biological and Environmental Research.

PY - 2008/12

Y1 - 2008/12

N2 - Motivation: Specific non-covalent binding of metal ions and ligands, such as nucleotides and cofactors, is essential for the function of many proteins. Computational methods are useful for predicting the location of such binding sites when experimental information is lacking. Methods that use structural information, when available, are particularly promising since they can potentially identify non-contiguous binding motifs that cannot be found using only the amino acid sequence. Furthermore, a prediction method that can utilize low-resolution models is advantageous because high-resolution structures are available for only a relatively small fraction of proteins. Results: SitePredict is amachine learning-based method for predicting binding sites in protein structures for specific metal ions or small molecules. The method uses Random Forest classifiers trained on diverse residue-based site properties including spatial clustering of residue types and evolutionary conservation. SitePredict was tested by cross-validation on a set of known binding sites for six different metal ions and five different small molecules in a non-redundant set of protein-ligand complex structures. The prediction performance was good for all ligands considered, as reflected by AUC values of at least 0.8. Furthermore, a more realistic test on unbound structures showed only a slight decrease in the accuracy. The properties that contribute the most to the prediction accuracy of each ligand were also examined. Finally, examples of predicted binding sites in homology models and uncharacterized proteins are discussed.

AB - Motivation: Specific non-covalent binding of metal ions and ligands, such as nucleotides and cofactors, is essential for the function of many proteins. Computational methods are useful for predicting the location of such binding sites when experimental information is lacking. Methods that use structural information, when available, are particularly promising since they can potentially identify non-contiguous binding motifs that cannot be found using only the amino acid sequence. Furthermore, a prediction method that can utilize low-resolution models is advantageous because high-resolution structures are available for only a relatively small fraction of proteins. Results: SitePredict is amachine learning-based method for predicting binding sites in protein structures for specific metal ions or small molecules. The method uses Random Forest classifiers trained on diverse residue-based site properties including spatial clustering of residue types and evolutionary conservation. SitePredict was tested by cross-validation on a set of known binding sites for six different metal ions and five different small molecules in a non-redundant set of protein-ligand complex structures. The prediction performance was good for all ligands considered, as reflected by AUC values of at least 0.8. Furthermore, a more realistic test on unbound structures showed only a slight decrease in the accuracy. The properties that contribute the most to the prediction accuracy of each ligand were also examined. Finally, examples of predicted binding sites in homology models and uncharacterized proteins are discussed.

UR - http://www.scopus.com/inward/record.url?scp=57249116902&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=57249116902&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btn543

DO - 10.1093/bioinformatics/btn543

M3 - Article

C2 - 18940825

AN - SCOPUS:57249116902

SN - 1367-4803

VL - 24

SP - 2865

EP - 2871

JO - Bioinformatics

JF - Bioinformatics

IS - 24

ER -

Predicting small ligand binding sites in proteins using backbone structure

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this