Pre-processing of submitted structures
Structures are pre-processed differently depending upon their input format (see the flowchart in Figure 1). Two input formats are supported by the H++ website, PDB and PQR. These files differ in that PQR files already have charges and radii assigned to each atom whereas PDB files do not.
If the input file is in PQR format, H++ makes minimal changes because it is assumed that the format is already suitable for electrostatic calculations. Changes made are as follows: atom names of all titratable amino acids are brought into accordance with the format adopted by the AMBER (29
) package and consistency checks are performed. These checks ensure that the total charge of the system is an integer (within a ±0.05 unit charge tolerance per amino acid) and the atomic radii are between 0.5 and 3 Å. If any of the above checks fail, the sequence of residues is discontinuous or the atom names are different from the PDB standard and cannot be recognized, execution terminates.
For a structure submitted in the conventional PDB format, H++ deletes all HETATM records; that is, only those atoms that belong to amino acids or nucleotides are kept. This is the ‘clean-up’ step in Figure 1. Removal of explicit water molecules and mobile counterions is generally consistent with the implicit solvent framework in which solvation effects are accounted for in the mean-field manner. If necessary, removed ligands can be included in the calculations by submitting the complete structure in the PQR format, avoiding the ‘clean-up’ step. Sequence continuity is verified and all atom names are brought into accordance with the format adopted by the AMBER package. Deuterium atoms are replaced with equivalent hydrogens.
Addition of missing atoms and optimization of hydrogens
This section applies only to input structures in PDB format. Missing heavy atoms and protons (assuming standard protonation states of titratable groups) are added, and atomic partial charges and radii (Bondi) are assigned using the PROTONATE and LEAP modules of AMBER. This is followed by an MD-based optimization of the positions of the added hydrogens. The protocol was tested earlier (10) in a similar context; it consists of three consecutive stages during which only hydrogens are allowed to move: first, 100 steps of conjugate gradient minimization; second, 500 steps of MD at 300 K, with all torsional potentials involving hydrogens set to zero; and third, 100 steps of conjugate gradient minimization with the torsional potential returned to normal values. The AMBER parm99 force-field is used, where the integration time-step is 1 fs and the charge–charge interactions are computed in a uniform ‘vacuum’ of dielectric out = 4.
The continuum electrostatics methodology widely used to calculate the energetics of proton transfer is described elsewhere (15,30); the model is available in the free software package MEAD (31). The H++ calculations rely upon the single-conformer version of MEAD; conformational variability is partially accounted for by the ‘smeared charge’ representation of titratable groups (see below) and the simulated annealing of protons described above. Although it is not the most systematic or exhaustive way of incorporating conformational variability, we believe, based on previous experiences (10,22,30), that this particular model is a reasonable balance between speed and accuracy, and is therefore a good choice for web-based calculations. In this model, the molecule is treated as a low dielectric medium in, and the surrounding solvent is assigned a high dielectric constant out. The electrostatic screening effects of (monovalent) salt enter via the Debye–Huckel screening parameter . The salt concentration, in and out are accessible to the user, and reasonable defaults are provided. The difference between the pKa of a sidechain and the pKa of the corresponding model compound in free solution is determined by the combined effect of two distinct contributions to the total electrostatic (free) energy change. The first is the ‘Born term’ or desolvation penalty, which always penalizes burial of a charge inside a low dielectric medium. The second is the background term, which represents the electrostatic interactions of the group in question with all other fixed charges in the molecule not belonging to any titratable groups. These energy terms, as well as the matrix of site–site interactions, are estimated through a sequence of calculations in which sites in the protein and their corresponding model compounds have their charge distributions set to those of the protonated or deprotonated form, and suitable energy differences are taken. For the protonated states of Asp and Glu, in which the correct location of the proton is not known a priori, a ‘smeared charge’ representation is employed, in which the neutralizing positive charge is symmetrically distributed: 0.45 on each carbonyl oxygen atom, and 0.1 on the carbon atom. The electrostatic calculations are based either on a GB or a PB model, as requested by the user. The particular GB model we are using was found earlier (32) to work reasonably well in pK calculations on proteins; here its improved version (33) is used. The set-up and finite-difference solution of the PB problems or analytical GB calculations are carried out using the MEAD program package. In the finite-difference lattices, two levels of focusing are used. In the coarsest level the bounding box is set to twice the molecule's maximum extent and the grid points are spaced 2 Å apart. The finest lattice is focused on the region of interest, and the grid points are 0.5 Å apart. The probe radius for defining the molecular surface, which is used as the boundary between the interior and exterior dielectric regions, is set to 1.4 Å.
Calculating titration curves, pK1/2 and protonation states
The electrostatic calculations outlined above provide (free) energies of each of the 2N protonation microstates (10) in the system, where N is the number of ionizable sites. To make the subsequent calculation of the partition functions (and pK1/2) manageable, a fast variant of a clustering approach is used (34). The approach subdivides the interacting sites into independent clusters based upon the strength of electrostatic site–site interactions between them. All electrostatic interactions for each ionizable site are sorted from highest to lowest; the top Cmax sites are then selected to contribute to the calculation, and all others are ignored. The partition function for the site is then factored into computationally manageable components of maximum size Cmax. Here, Cmax = 17 is used: in tests on 600 representative proteins (35), we found (J. Myers, G. Grothaus and A. Onufriev, manuscript submitted) that Cmax = 17 resulted in average errors of 0.2 pK units, compared with a standard treatment based upon a Monte Carlo approach (16).
The probability of protonation is computed for every site over a range of pH values equally spaced by 0.1 pH units apart. Individual curves can be displayed for user-selected residues, and the total protonation curve is generated, showing the computed isoelectric point of the molecule. A diagram showing the 10 lowest protonation states and their relative free energies is also generated. These diagrams were found useful (10) for analysis of proton transfer events in biomolecular systems.
Generating the PDB structure in its predicted protonation state
The computed titration curves provide an estimate of the probability of protonation of each titratable site at the (user-specified) pH of the solvent. A simple scheme is used for assignment of protonation states: if the estimated probability is the site is considered deprotonated; otherwise, the site is protonated. We follow AMBER conventions in placement of new hydrogen atoms and selection of hydrogen atoms to be removed. Deprotonated (neutral) states of Arg and Tyr, not available in the AMBER databases, are obtained from the protonated forms by removal of the HH22 and HH protons, respectively; their partial charges are brought into accordance with the appropriate model compound values supplied by the MEAD package. The structures submitted in PQR format do not undergo re-assignment of protonation states, allowing greater flexibility for this input format.