Entry Date:
April 12, 2003

Computational Analysis and Prediction of Phosphopeptide Binding Sites


A machine-learning technique has been developed for the purpose of analyzing the properties of phosphopeptide binding sites on the surface of proteins, and is applied predictively to the surface of a phosphopeptide binding domain for which the ligand binding site is unknown. While previous techniques for ligand binding site prediction have focused on the properties of relatively large regions of proteins, this technique focuses on the local chemical and physical properties of extremely small portions of the protein surface.

Protein phosphorylation leads to the regulation of signaling, often by the generation of a binding site for a phosphopeptide binding domain. Though all phosphopeptide binding domains are capable of binding to phosphorylated peptides, there is relatively little structural similarity in the binding sites. An understanding of the commonalities among the natural phosphopeptide binding domains should be useful in continuing efforts to design new phosphopeptide binding domains with novel specificity. Moreover, an understanding of what constitutes a phosphopeptide binding site lends predictive ability in cases where a protein of known structure is known to function as a phosphopeptide binding domain at an unknown surface location.

A set of nine crystal structures of phosphopeptide binding domains in complex with phosphopeptides were surfaced with a triangulated mesh. At each mesh vertex, a set of physical and chemical properties including amino acid identity, local surface curvature, and solvated electrostatic potential were calculated. The enrichment of these properties in sites which are bound to phosphorylated amino acid side chains with respect to the entire protein surface was determined.

In order to determine the predictive capacity of this propensity information, a jack-knifing validation procedure was used in which each crystal structure was removed from the training data, and propensities were recalculated. The learned propensities were then painted onto the surface of the removed protein, using the assumption that propensities based on the three characteristics studied combine independently. Finally, the propensities learned from the entire training set were applied in the same way to the surface of BRCA1, which was recently identified as a phosphopeptide binding domain, but for which binding location data is unknown.

Visual inspection of the results of jack-knife validation indicates that the current surfaceelement model of phosphopeptide binding is predictive, with little tendency to false negative predictions, but somewhat higher tendency to give false positives. Since this allows the generation of experimentally testable hypotheses, it is significantly more useful than the converse.

Such hypotheses were generated in the case of the BRCT domain of the protein BRCA1 and the kinase Chk1. Mutations to the protein BRCA1, including one which abrogates binding to phosphopeptides are commonly associated with breast and ovarian cancer in women. When the model developed here was applied to the surface of the rat BRCA1 structure, two putatitve phosphopeptide binding sites were found. One of these was crystallographically determined to be a correct prediction of the site of phosphopeptide binding. Likewise, two potential binding sites were found on the surface of Chk1, and experimental evidence exists to indicate that one of them may be the correct site.

Model improvements are being considered in which non-phosphopeptide ligand binding propensities are calculated and used to distinguish phosphopeptide-binding sites from sites which are more generically "sticky", as this is not selected against explicitly in the current model and may contribute to the rate of false positives. This would perhaps allow our method to be used in a prospective fashion to mine the Protein Data Bank for novel phosphopeptide binding domains.