Electrostatic interactions are often a key factor determining the properties of biomolecules (1–5), including their biological function such as catalytic activity (6,7), ligand binding (8), complex formation (9) and proton transport (10), as well as their structure and stability (11,12).
The electrostatic properties of a molecule can change dramatically depending on the ionization (protonation) states of its titratable groups. The latter depend on the groups' type, location within the macromolecule, ionization state of other titratable sites and the pH and ionic strength of the surrounding solvent.
On one hand, experimental determination, usually by NMR, of protonation equilibria is expensive and often cannot be performed for every group of interest; on the other hand, individual protons are usually not resolved by ‘standard’ X-ray crystallography, and so most of the structures from the Protein Data Bank (PDB) are incomplete, in that they are missing hydrogen atoms. Coordinates of most of the missing protons, e.g. those on CH3 groups, are relatively easy to reconstruct based on a set of straightforward chemical rules; however, predicting the protonation states of titratable groups such as Asp, Glu, Arg, Lys, Tyr, His or Cys is not trivial. A complete, all-atom structural model is usually required as input for many common molecular modeling techniques such as molecular dynamics (MD) simulations.
A number of theoretical methods exist that predict pKa and protonation states of ionizable groups; see e.g. (13–27). Most of these methods are based on the ‘implicit solvent’ model, in which individual water molecules and mobile solvent ions are replaced by a continuous medium with the average properties of the solvent. Some approaches go beyond this and explicitly take into account the solvent's degrees of freedom (19,21,27), albeit at a significantly larger computational expense. Since electrostatic interactions are the key factor determining the protonation equilibria, considerable effort has been spent to improve the accuracy of their estimation. Apart from the very early approaches (13,28) that represented a molecule as a low dielectric sphere and that made mostly qualitative predictions, all modern methods use atom-detail information from high-resolution PDB structures. Generally, higher resolution data yield more accurate predictions. Although these methods vary in the details of the underlying physical models, they share one common feature—computational and algorithmic complexity. The latter stems, in general, from the sensitivity of the computed electrostatic interactions to the approximations involved and the details of the input structure. Hence, the computational process usually involves multiple non-trivial steps. There is often an additional complication arising from irregularities within the input PDB structures, such as naming inconsistencies and missing or duplicate atom records. Significant ‘pre-processing’ of structures is therefore required. As a result, modern methods that predict protonation equilibria and add missing hydrogens to PDB structures are frequently associated with a rather steep learning curve, often precluding novices from using them. Even for experts, the manual set-up of such calculations is often time consuming, and potentially useful variations of the input parameters and/or structural models remain unexplored.
This paper describes the freely available web server http://biophysics.cs.vt.edu/H++, which is designed to automate prediction of pKa and protonation states of ionizable residues in macromolecules, using atomic resolution structures as input. The output structure contains missing hydrogens added according to calculated protonation states and is available in several formats used by a number of popular molecular modeling packages. The calculations are based on the standard continuum solvent methodology (15), within the frameworks of either the generalized Born (GB) or the Poisson–Boltzmann (PB) models (user-specified). All steps of the computational process are fully automated. Commonly used input parameters are accessible via a simple interface that provides reasonable defaults. The server is intended for both experts and non-experts.