Construction of the CABS decoy sets
We tested the proposed model ranking protocol on large sets of near-native decoys. We constructed a benchmark of 7 proteins , which are representative in respect to their length and secondary structure content (Table 1). None of these proteins was present, or had detectable homologs, in the library of the protein fragments used for the backbone reconstruction (the test structures were added very recently to the PDB).
Several studies have utilized different energy functions to discriminate the native structure among sets of decoys built in different ways [9,22-25]. Typically, the decoys have been generated by means of various threading procedures. Unfortunately, the decoys' sets built by threading contain many incorrect structures, mainly due to the alignment problems in the threading algorithms, resulting in incorrectly paired tertiary contacts or wrong secondary structure assignments. In contrast, Park & Levitt decoys' set  was generated by means of a lattice modeling with constrained native secondary structure and covered wide range of RMSD values. Decoys built by Rosetta from Lee et al. work  exhibited varying topologies with locally optimized structure. The size of this set (a small number of small proteins) is probably not sufficient for a clear estimation of the correlation between RMSD and energy. That makes such decoys' sets less challenging then those investigated in the current work. While threading methods are limited to the existence of structural analogs, high-resolution lattice models can be used efficiently in comparative modeling as well as in de novo structure predictions .
In this work, long Monte Carlo simulation with the CABS model  were performed in order to generate protein-like near-native decoys with RMSD in the range of 0.35 – 3 Å from native (0.35 Å is the average accuracy of the Cα-trace projections onto the CABS underlying lattice grid). The CABS model features high computational efficiency and has the ability to cover the near-native conformational space (given some constraints extracted from a correctly aligned template) or can be used in de novo structure prediction. As can be seen from the example given in Figure 1 and Figure 2, the near-native decoys generated by CABS consist of structures varying mainly in the most flexible regions as loops or near the ends of the secondary structure elements. We have decided to limit the range of the decoys diversity from about 0.35 to 3 Å from the native. This is a typical range for the comparative modeling.
The main objectives of the use of Molecular Mechanics force field after the coarse-grained stage modeling in our work are: improving filtering of the crude models, providing better correlation with similarity to the native, and then identification of the best model (closest to the native). To reliably verify the correlation between the Molecular Mechanics energy and RMSD we decided to divide each protein subset (60 – 150 thousands of structures, depending on a protein) onto 30 bins, using RMSD from the native structure as a criterion for the classification (from 0 to 3 Å, with the bin size of 0.1 Å). From each bin, 30 models or less if there weren't as many, were randomly selected. In this manner approximately 800 decoys were selected for each protein, with a broad spectrum of the quality of models (in the sense of similarity to the native structure).
From simplified to all-atom representation
Employing simplified protein representation for exploration of the vast conformational space (including de novo structure prediction, various comparative modeling techniques, or hybrid methods utilizing different kinds restraints from experimental data) brings the necessity of reconstruction of the reduced models to the all-atom representation, compatible with the classical all-atom modeling tools . Rebuilding procedure may also be beneficial when structures from different sources (and of different quality) are being compared . Recently, during the extensive tests of available methods for reconstruction of protein backbone from Cα-trace , we found that in the cases of the high accuracy models (better than 1.5 Å) the best performance is achieved by the procedure employing insertion of well adjusted fragments from known protein structures  (implemented in the Sybyl software from Tripos Inc. St. Louis, MO). Such procedure improves the local geometry of the backbone. To assure the best possible reconstruction, we applied this method to our benchmark set. Similarly to the main-chain backbone atoms, the side-chains were reconstructed and their conformations optimized using Sybyl. It is worth noting, that the increase of the number of experimentally determined high-resolution structures in the Protein Data Bank (PDB) may lead to further improvements in the all-atom reconstruction methods that use protein fragments from the PDB.
CABS decoys evaluation by all-atom minimizations
Evaluation of protein models were done by all-atom minimization with frozen alpha carbons using Amber7 FF99 force field and Amber charges  implemented in Sybyl. The effect of solvent has been neglected and a uniform value of the dielectric constant was set equal to 1. Due to the frozen positions of the alpha carbons, this is probably an acceptable approximation. What important, it is often unknown whether the target sequence is a part of a larger oligomeric structure . If it is, the solvation energy term would unnecessarily penalize for the exposed binding part of the protein surface. Moreover, the ranking of large sets of decoys of relatively large molecules requires as fast as possible computations. That would be impossible, or very difficult, with the explicit treatment of the surrounding solvent . Also, the fixed positions of the alpha carbons prevent from evolution of the all-atom systems into directions of non-native local minima. On the other hand, a significant repacking of the model structures is rather unlike within the frozen Cα approximation. The underlying assumption is that the set of decoys contains a fraction of a good-geometry near-native structures.
The results of minimization are illustrated in Figure 3. For each protein, the decoys' energies after 1000 iterations of the Sybyl minimization were plotted as a function of the Cα-trace RMSD. For all proteins, resulting energies as a function of RMSD divide into two ensembles: wedge shaped low energy values (Figure 3, right panels) and abnormal high energy values (Figure 3, left panels). The abnormal high energy values, observed for a fraction of the decoys, resulted mainly from bond stretching and the van der Waals repulsive energy contribution due to the rebuilding inaccuracies leading to the overlaps of some atoms. The decoys were produced by the low resolution search, with a very simplified representation of the side chains. This flattens the energy landscape but it also may result in a distorted geometry of the Cα-trace. This is in the agreement with the observation that physic-based energy functions are sensitive to small displacements as opposed to the statistical energy functions . The rebuilding procedure aimed at adjustment of protein fragments as closely as possible to the initial Cα trace, and consequently was not always capable of constructing structures without some local defects. Structures with small errors can be easy filtered out by a short minimization – range of 200 iterations, regardless of the protein length. This is sufficient to reject the decoys with the local defects (energy > 0) and it takes about 5 minutes per one structure of a large protein domain (2CJPA) on a single LINUX box. Interestingly, in all cases such short minimization leads practically to the same correlation between energy and RMSD as a 5 times longer minimization (e.g. the Pearson's correlation coefficient was equal to r = 0.79 for 2GR8A models that were scored in the negative energy range after 200 iterations, and for the same models r = 0.80 after 1000 iterations procedure). However, we found that while in the case of the high accuracy decoys a longer minimization didn't bring any substantial changes to the ranking, for filtering out the medium accuracy decoys (in the range from 2 Å to 3 Å) from the worse models, a longer minimization (1000 iterations) led to better results in the identifying the best model (see the section on Evaluation of the MOULDER testing set).
The number of steric mistakes grows with the protein length (Table 1, Figure 3). The exception is 2GRRB, an all-alpha protein which was rebuilt the most accurately from the whole set. While clashes could be easily removed via a short relaxation of the entire structures, constructing instead a larger number of the reduced space decoys (and rejecting these with clashes) seems to be a more effective option.
Figure 3 clearly shows that the proposed procedure leads to the proper ranking of models – there is very good correlation between energy and the RMSD from the native structure. The lowest energy models are always very close to the native and in most cases the best decoys have been selected.
At this point it should be added, that there is nothing specific about the decoys generated by CABS with the subsequent all-atom rebuilding. The CASP6 assessments have shown, that the local geometry of the CABS models was on average the same as the local geometry of the models built by means of other high-performance methods for protein structure prediction. This is mainly due to the fact, that various all-atom reconstruction procedures employ in a similar fashion protein fragments extracted from the high-resolution crystallographic structures. Thus, the proposed method should work similarly well for decoys generated by means of different modeling algorithms.
Evaluation of the MOULDER testing set
To test the ability of our protocol to discriminate a medium-accuracy models (better than 3 Å), from a low-accuracy models we used MOULDER decoys' set, evaluated by Eramian et al.  using 24 individual assessment scores, including physic-based energy functions, statistical potentials, and machine-learning scoring functions. Each of the targets from the set was modeled using a template of 95% of identically alignment positions. The target-template alignments were obtained using MOULDER  with MODELLER  to create 300 different target-template alignments.
Of the 20 targets subsets, only 7 contain models better than 3 Å (for RMSD range and median RMSD see Table 2). We decided to reduce the testing set to these 7 proteins, since the sensitivity to small structural displacement make physical force-fields less suitable for the assessment of models with larger errors . Before the minimization procedure, coordinates of the alpha carbons of the models were extracted and subjected to rebuilding procedure, identical to the one applied to the CABS decoys' set.
The performance of the methods with the MOULDER decoys' set were measured by average RMSD difference (ΔRMSD) between the model identified as the best of the set and the model with the lowest RMSD. Each of the sets of 300 models was split into 2000 randomly populated smaller sets of 75 models. The purpose of this division was to reduce the impact of individual target sets on the final ranking and to increase the robustness of the benchmark. For each 75 model set, the model with the lowest Cα RMSD (after superposition with the native structure) was used as a reference to calculate the ΔRMSD measure. We followed the same rules, despite the fact that the average ΔRMSD value of the 75 model set is not suited for evaluating of our procedure, which is aimed at assessing much larger sets due to its characteristics. Our method narrows down the number of models in the testing set rejecting the fraction of them (extreme high energy values), due to their small inaccuracies. Sensitivity for the small displacements is the price for the high discriminatory power . Additionally, due to the same reasons, it is desirable to provide a few copies of the model with the small differences, to maximize the chance of the accurate scoring. Such sets of models (clusters), representative for a various type of conformations, can be easily extracted from reduced modeling trajectories by structural clustering  and subjected to the evaluation procedure.
Two worst cases, which are the two largest proteins from the set, illustrate the effect of narrowing down the number of models and insufficient number of good models in the set. The former situation is observed for the subset of 2fjbL models, where only one third had been the subject to the 1000 iteration minimization, while the rest was rejected after 200 step minimization (E>0). The latter could be observed in the case of the 2cmd subset of models, which is the subset with the smallest number of models better than 3 Å (16 out of 300).
Omitting these two worst cases, our method performed similarly to the Rosetta scoring function which is apparently the most successful in de novo high-resolution small protein structure prediction . ΔRMSD value averaged for the 5 subsets of proteins was 0.47 for our procedure (for individual subsets values see Table 2) and 0.49 for the Rosetta. Corresponding values for two physic-based approaches used in the study by Eramian et al. were 0.56 and 0.51 for GB (CHARMM with Generalized Born solvent model) and EEF1 (CHARMM EEF1), respectively . The authors noted that in the selecting the best model from a set of very similar models EEF1 and GB were more accurate than many of the statistical potentials tested. According to their suggestion it is possible that different relaxation schemes would have produced better results. They took also into consideration, that by including the solvent model, the oligomeric proteins were presumably harder to evaluate than monomers.
The ability of the all tested methods to identify native-like models greatly varied across different targets . The most accurate methods that obtained the best results for the 7 targets considered here (see Table 2 for the best avg. ΔRMSD) were: DFIRE, MODCHECK and MODPIPE_COMBI implementing different kinds of statistical potentials and PSIPRED/DSSP (score based on predicted secondary structure using PSIPRED, compared with the model secondary structure assigned by DSSP algorithm) .
As we said earlier, the minimization with frozen Cα has to be performed on a sufficient number of models. The number of models in the studied MOULDER subsets (300) seems to be enough (Table 2 and Figure 4). Obtained ΔRMSD on the whole subsets surprisingly well correlate with the minimum values of the RMSD (Table 2), confirming the usefulness of our procedure in the high-accuracy modeling protocol. Clearly, performance of our methods improves with increasing average quality of the decoys. Thus, the analysis of the MOULDER decoys indicates that the best results of the proposed procedure are expected for the sets of relatively good models. This is actually a nice finding, since ranking of very bad models isn't useful anyway. It is also worth to mention that the final selection can be likely improved by a structural clustering of the best scored models.