Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches

Jean-Pierre Kocher, Marianne J. Rooman, Shoshana J. Wodak

Research output: Contribution to journalArticle

193 Citations (Scopus)

Abstract

Several types of potentials are derived from a dataset of known protein structures by computing statistical relations between amino acid sequence and different descriptions of the protein conformation. These potentials formulate in different ways backbone dihedral angle preferences, pairwise distance-dependent interactions between amino acid residues, and solvation effects based on accessible surface area calculations. Parameters affecting the characteristics and the performance of the potentials are critically assessed by monitoring recognition of the native fold in a strict screening test, where each sequence in the dataset is threaded through a repertoire of motifs, generated from all corresponding structures. Sequence gaps are not allowed, to avoid additional approximations. Results show that residue interaction potentials computed from distances between average side-chain centroids perform significantly better on this test than those computed considering inter-C(α) or inter-C(β) distances. Combining potentials that are based on different structural descriptions and different interactions is also beneficial. The performance of some of these potentials is in fact so good that they recognize the correct fold for all the tested proteins, including subunits known to be unstable in the absence of quaternary interactions. Most strikingly, potentials representing backbone dihedral angle preferences recognize as many as 68 protein chains out of a total of 74, even though they consider solely local interactions along the chain, which, being the same as those considered in secondary structure prediction methods, are well known to be incapable of determining the full three-dimensional fold. This leads us to question the ability of procedures that screen a limited repertoire of structures to act as a stringent test for the potentials. We concede, however, that they are useful and fast tests, capable of revealing gross shortcomings of the potentials, or possible biases towards native recognition due, for example, to effects of sequence memory.

Original languageEnglish (US)
Pages (from-to)1598-1613
Number of pages16
JournalJournal of Molecular Biology
Volume235
Issue number5
DOIs
StatePublished - 1994
Externally publishedYes

Fingerprint

Aptitude
Mathematical Computing
Protein Conformation
Protein Subunits
Amino Acid Sequence
Proteins
Amino Acids
Recognition (Psychology)
Datasets

Keywords

  • Potential functions
  • Protein data bases
  • Protein structure prediction

ASJC Scopus subject areas

  • Virology

Cite this

Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. / Kocher, Jean-Pierre; Rooman, Marianne J.; Wodak, Shoshana J.

In: Journal of Molecular Biology, Vol. 235, No. 5, 1994, p. 1598-1613.

Research output: Contribution to journalArticle

@article{f99f6ae85a9848aa9609bbbbc87597c4,
title = "Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches",
abstract = "Several types of potentials are derived from a dataset of known protein structures by computing statistical relations between amino acid sequence and different descriptions of the protein conformation. These potentials formulate in different ways backbone dihedral angle preferences, pairwise distance-dependent interactions between amino acid residues, and solvation effects based on accessible surface area calculations. Parameters affecting the characteristics and the performance of the potentials are critically assessed by monitoring recognition of the native fold in a strict screening test, where each sequence in the dataset is threaded through a repertoire of motifs, generated from all corresponding structures. Sequence gaps are not allowed, to avoid additional approximations. Results show that residue interaction potentials computed from distances between average side-chain centroids perform significantly better on this test than those computed considering inter-C(α) or inter-C(β) distances. Combining potentials that are based on different structural descriptions and different interactions is also beneficial. The performance of some of these potentials is in fact so good that they recognize the correct fold for all the tested proteins, including subunits known to be unstable in the absence of quaternary interactions. Most strikingly, potentials representing backbone dihedral angle preferences recognize as many as 68 protein chains out of a total of 74, even though they consider solely local interactions along the chain, which, being the same as those considered in secondary structure prediction methods, are well known to be incapable of determining the full three-dimensional fold. This leads us to question the ability of procedures that screen a limited repertoire of structures to act as a stringent test for the potentials. We concede, however, that they are useful and fast tests, capable of revealing gross shortcomings of the potentials, or possible biases towards native recognition due, for example, to effects of sequence memory.",
keywords = "Potential functions, Protein data bases, Protein structure prediction",
author = "Jean-Pierre Kocher and Rooman, {Marianne J.} and Wodak, {Shoshana J.}",
year = "1994",
doi = "10.1006/jmbi.1994.1109",
language = "English (US)",
volume = "235",
pages = "1598--1613",
journal = "Journal of Molecular Biology",
issn = "0022-2836",
publisher = "Academic Press Inc.",
number = "5",

}

TY - JOUR

T1 - Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches

AU - Kocher, Jean-Pierre

AU - Rooman, Marianne J.

AU - Wodak, Shoshana J.

PY - 1994

Y1 - 1994

N2 - Several types of potentials are derived from a dataset of known protein structures by computing statistical relations between amino acid sequence and different descriptions of the protein conformation. These potentials formulate in different ways backbone dihedral angle preferences, pairwise distance-dependent interactions between amino acid residues, and solvation effects based on accessible surface area calculations. Parameters affecting the characteristics and the performance of the potentials are critically assessed by monitoring recognition of the native fold in a strict screening test, where each sequence in the dataset is threaded through a repertoire of motifs, generated from all corresponding structures. Sequence gaps are not allowed, to avoid additional approximations. Results show that residue interaction potentials computed from distances between average side-chain centroids perform significantly better on this test than those computed considering inter-C(α) or inter-C(β) distances. Combining potentials that are based on different structural descriptions and different interactions is also beneficial. The performance of some of these potentials is in fact so good that they recognize the correct fold for all the tested proteins, including subunits known to be unstable in the absence of quaternary interactions. Most strikingly, potentials representing backbone dihedral angle preferences recognize as many as 68 protein chains out of a total of 74, even though they consider solely local interactions along the chain, which, being the same as those considered in secondary structure prediction methods, are well known to be incapable of determining the full three-dimensional fold. This leads us to question the ability of procedures that screen a limited repertoire of structures to act as a stringent test for the potentials. We concede, however, that they are useful and fast tests, capable of revealing gross shortcomings of the potentials, or possible biases towards native recognition due, for example, to effects of sequence memory.

AB - Several types of potentials are derived from a dataset of known protein structures by computing statistical relations between amino acid sequence and different descriptions of the protein conformation. These potentials formulate in different ways backbone dihedral angle preferences, pairwise distance-dependent interactions between amino acid residues, and solvation effects based on accessible surface area calculations. Parameters affecting the characteristics and the performance of the potentials are critically assessed by monitoring recognition of the native fold in a strict screening test, where each sequence in the dataset is threaded through a repertoire of motifs, generated from all corresponding structures. Sequence gaps are not allowed, to avoid additional approximations. Results show that residue interaction potentials computed from distances between average side-chain centroids perform significantly better on this test than those computed considering inter-C(α) or inter-C(β) distances. Combining potentials that are based on different structural descriptions and different interactions is also beneficial. The performance of some of these potentials is in fact so good that they recognize the correct fold for all the tested proteins, including subunits known to be unstable in the absence of quaternary interactions. Most strikingly, potentials representing backbone dihedral angle preferences recognize as many as 68 protein chains out of a total of 74, even though they consider solely local interactions along the chain, which, being the same as those considered in secondary structure prediction methods, are well known to be incapable of determining the full three-dimensional fold. This leads us to question the ability of procedures that screen a limited repertoire of structures to act as a stringent test for the potentials. We concede, however, that they are useful and fast tests, capable of revealing gross shortcomings of the potentials, or possible biases towards native recognition due, for example, to effects of sequence memory.

KW - Potential functions

KW - Protein data bases

KW - Protein structure prediction

UR - http://www.scopus.com/inward/record.url?scp=0028318094&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028318094&partnerID=8YFLogxK

U2 - 10.1006/jmbi.1994.1109

DO - 10.1006/jmbi.1994.1109

M3 - Article

C2 - 8107094

AN - SCOPUS:0028318094

VL - 235

SP - 1598

EP - 1613

JO - Journal of Molecular Biology

JF - Journal of Molecular Biology

SN - 0022-2836

IS - 5

ER -