Background: Empirical scoring functions have proven useful in protein structure modeling. Most such scoring functions depend on protein side chain conformations. However, backbone-only scoring functions do not require computationally intensive structure optimization and so are well suited to protein design, which requires fast score evaluation. Furthermore, scoring functions that account for the distinctive relative position and orientation preferences of residue pairs are expected to be more accurate than those that depend only on the separation distance.Results: Residue pair scoring functions for fixed backbone protein design were derived using only backbone geometry. Unlike previous studies that used spherical harmonics to fit 2D angular distributions, Gaussian Mixture Models were used to fit the full 3D (position only) and 6D (position and orientation) distributions of residue pairs. The performance of the 1D (residue separation only), 3D, and 6D scoring functions were compared by their ability to identify correct threading solutions for a non-redundant benchmark set of protein backbone structures. The threading accuracy was found to steadily increase with increasing dimension, with the 6D scoring function achieving the highest accuracy. Furthermore, the 3D and 6D scoring functions were shown to outperform side chain-dependent empirical potentials from three other studies. Next, two computational methods that take advantage of the speed and pairwise form of these new backbone-only scoring functions were investigated. The first is a procedure that exploits available sequence data by averaging scores over threading solutions for homologs. This was evaluated by applying it to the challenging problem of identifying interacting transmembrane alpha-helices and found to further improve prediction accuracy. The second is a protein design method for determining the optimal sequence for a backbone structure by applying Belief Propagation optimization using the 6D scoring functions. The sensitivity of this method to backbone structure perturbations was compared with that of fixed-backbone all-atom modeling by determining the similarities between optimal sequences for two different backbone structures within the same protein family. The results showed that the design method using 6D scoring functions was more robust to small variations in backbone structure than the all-atom design method.Conclusions: Backbone-only residue pair scoring functions that account for all six relative degrees of freedom are the most accurate and including the scores of homologs further improves the accuracy in threading applications. The 6D scoring function outperformed several side chain-dependent potentials while avoiding time-consuming and error prone side chain structure prediction. These scoring functions are particularly useful as an initial filter in protein design problems before applying all-atom modeling.
ASJC Scopus subject areas
- Structural Biology
- Molecular Biology
- Computer Science Applications
- Applied Mathematics