Identifying the Determinants of Conserved Protein Solvation

Michael L. Raymer, Paul C. Sanschagrin, and Leslie A. Kuhn

Protein-bound water molecules have a range of important roles including contributions to ligand binding, catalysis, and structural stabilization. Nevertheless, the structural and chemical characteristics that determine favorable water binding sites have only been partially characterized. To address this gap, we have trained a hybrid genetic algorithm/k-nearest-neighbor classifier to predict first-shell water molecules conserved between independently-solved crystallographic structures for 30 non-homologous proteins. This prediction is based on eight features reflecting the protein environment of the water molecule, including the crystallographic temperature factor (B-value) and mobility (B-value normalized by occupancy) of the water molecule, the number of hydrogen bonds to the protein and to other water molecules, and the local atomic density, atomic hydrophilicity, and average and summed B-values of neighboring protein atoms. Maximal predictive accuracy was attained using a subset of four of the eight available features: B-value, mobility, and the number of hydrogen bonds to protein atoms and to water molecules. The relative weights determined for these features, 0.413, 0.315, 0.135, and 0.137, indicate that B-value and mobility are the most important predictors of water site conservation, rather than the number of hydrogen bonds. This weighted set of features was sufficient to predict conservation of first-shell water molecules with an accuracy of 64.2% in cross-validation tests, which is significantly greater than the accuracy obtained by unweighted k-nearest-neighbor classification or discriminant analysis.

Poster presented at the Twelfth Protein Society Symposium, San Diego, CA, July 25-29, 1998