Discovering the Chemistry of Conserved First-Shell and Active-Site Hydration in Proteins Using Pattern Classification with a Genetic Algorithm

Michael L. Raymer, William F. Punch, Paul C. Sanschagrin, Erik D. Goodman, and Leslie A. Kuhn

Water molecules are bound in the active sites of nearly all proteins. While some waters are displaced upon ligand binding, those that remain are an essential part of the protein surface in terms of ligand docking and design. Despite their importance, the chemistry governing conservation of protein-bound water molecules remains largely unexplored. To investigate this question, the physical and chemical environments of first-hydration-shell waters in 20 non-homologous protein structures were used to train a hybrid k-nearest-neighbors classifier/genetic algorithm to predict the conserved/displaced status of first-shell water molecules. The genetic algorithm determined which environmental features were important for classification, and provided weight values for these features to use in classification. The predictive accuracy of this approach for first-shell water molecules was approximately 65%, greater than that obtained by unweighted classification or discriminant analysis. Furthermore, the features selected by the genetic algorithm and their weights exhibited consistency over multiple experiments. Analysis of results allows us to identify environmental features that predict conserved hydration, and those that do not contribute. This method for identifying the relative importance of environmental features for conserved binding can be directly applied to other problems, such as the prediction of ion or ligand binding sites.

Picture of the presentation in Boston

Poster presented at the Eleventh Protein Society Symposium,
Boston, MA, July 12-16, 1997