TY - JOUR
T1 - Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches
AU - Kocher, Jean Pierre A.
AU - Rooman, Marianne J.
AU - Wodak, Shoshana J.
PY - 1994/2/3
Y1 - 1994/2/3
N2 - Several types of potentials are derived from a dataset of known protein structures by computing statistical relations between amino acid sequence and different descriptions of the protein conformation. These potentials formulate in different ways backbone dihedral angle preferences, pairwise distance-dependent interactions between amino acid residues, and solvation effects based on accessible surface area calculations. Parameters affecting the characteristics and the performance of the potentials are critically assessed by monitoring recognition of the native fold in a strict screening test, where each sequence in the dataset is threaded through a repertoire of motifs, generated from all corresponding structures. Sequence gaps are not allowed, to avoid additional approximations. Results show that residue interaction potentials computed from distances between average side-chain centroids perform significantly better on this test than those computed considering inter-C(α) or inter-C(β) distances. Combining potentials that are based on different structural descriptions and different interactions is also beneficial. The performance of some of these potentials is in fact so good that they recognize the correct fold for all the tested proteins, including subunits known to be unstable in the absence of quaternary interactions. Most strikingly, potentials representing backbone dihedral angle preferences recognize as many as 68 protein chains out of a total of 74, even though they consider solely local interactions along the chain, which, being the same as those considered in secondary structure prediction methods, are well known to be incapable of determining the full three-dimensional fold. This leads us to question the ability of procedures that screen a limited repertoire of structures to act as a stringent test for the potentials. We concede, however, that they are useful and fast tests, capable of revealing gross shortcomings of the potentials, or possible biases towards native recognition due, for example, to effects of sequence memory.
AB - Several types of potentials are derived from a dataset of known protein structures by computing statistical relations between amino acid sequence and different descriptions of the protein conformation. These potentials formulate in different ways backbone dihedral angle preferences, pairwise distance-dependent interactions between amino acid residues, and solvation effects based on accessible surface area calculations. Parameters affecting the characteristics and the performance of the potentials are critically assessed by monitoring recognition of the native fold in a strict screening test, where each sequence in the dataset is threaded through a repertoire of motifs, generated from all corresponding structures. Sequence gaps are not allowed, to avoid additional approximations. Results show that residue interaction potentials computed from distances between average side-chain centroids perform significantly better on this test than those computed considering inter-C(α) or inter-C(β) distances. Combining potentials that are based on different structural descriptions and different interactions is also beneficial. The performance of some of these potentials is in fact so good that they recognize the correct fold for all the tested proteins, including subunits known to be unstable in the absence of quaternary interactions. Most strikingly, potentials representing backbone dihedral angle preferences recognize as many as 68 protein chains out of a total of 74, even though they consider solely local interactions along the chain, which, being the same as those considered in secondary structure prediction methods, are well known to be incapable of determining the full three-dimensional fold. This leads us to question the ability of procedures that screen a limited repertoire of structures to act as a stringent test for the potentials. We concede, however, that they are useful and fast tests, capable of revealing gross shortcomings of the potentials, or possible biases towards native recognition due, for example, to effects of sequence memory.
KW - Potential functions
KW - Protein data bases
KW - Protein structure prediction
UR - http://www.scopus.com/inward/record.url?scp=0028318094&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0028318094&partnerID=8YFLogxK
U2 - 10.1006/jmbi.1994.1109
DO - 10.1006/jmbi.1994.1109
M3 - Article
C2 - 8107094
AN - SCOPUS:0028318094
SN - 0022-2836
VL - 235
SP - 1598
EP - 1613
JO - Journal of Molecular Biology
JF - Journal of Molecular Biology
IS - 5
ER -