Overall, the Pull & Grab-based algorithm ranked highest; it explained 72% of the variance in the testing set, despite being calibrated only on the training set. Linear regression ranked second-best; it explained 68% of the variance. Interpolation (with *z *= *1.5*) ranked third; it explained 64% of the variance. The Garki model explained 58% of the variance. The results are plotted in Figure 4, summarized in Table 1, and discussed in detail in the following sections. Because the algorithm based on Pull & Grab ranked highest, and because it was never fitted to the testing set, the issue of parsimony was not relevant. Had the algorithm based on linear regression ranked highest, the issue of parsimony would have been an important concern.

By itself, PR_{2 }explained only 5% of the variance in PR_{1 }(i.e. r^{2 }= 0.05). Linear regression using the formula PR_{1 }= a + b PR_{2 }improved the r^{2 }to 26%. Linear regression with the formula PR_{1 }= a + b PR_{2 }+ c L_{1 }+ d L_{2 }+ e U_{1 }+ f U_{2 }improved the r^{2 }to 68%, the same as with the slightly non-linear version that always returned a value between 0 and 1. All of the coefficients were statistically significant at the 95% level except for c, the coefficient corresponding to L_{1}.

#### Interpolation

Several different values of *z *were evaluated including *z *= *0.5, 1, 1.5, 2, 3, 4, 6, 10*. Of these values, *z *= *1.5 *minimized the sum of squared errors and explained 65% of the variance.

#### Pull & Grab

The trimmed mean value for *α *corresponded to a decline in PfPR beginning around age 9.5. The trimmed mean value for *s *corresponded to a decline in sensitivity to approximately 36% of P', but the trimmed mean value of *c *was low and implied that PfPR declines slowly, so that the apparent PfPR is not close to 36% of the PfPR in children between two and ten years of age until late in life. There was substantial variability in *b*, but by the trimmed mean, PfPR was within 90% of P' by age two. The parameter names, interpretations, and values are summarized in Table 2. Some of the individual datasets differed from this pattern, but the majority were within this value by age two. The family of curves described by the algorithm is illustrated in Figure 5. This algorithm explained 72% of the variance.

The algorithm based on the Garki model [14] and calibrated during the Garki Project [31] was recently used to generate endemicity maps [20]. The algorithm was therefore included as a candidate for comparison; the relationship between age and PR is qualitatively very similar to the modified Pull & Grab model. It was, therefore, used as-is, without further fitting to the training set or the testing set. It explained 58% of the variance, the lowest of the four algorithms considered.

### Analysis of the sub-sampled data

To evaluate the algorithms further, and to enable the linear regression analysis to be assessed as an algorithm, the testing set was sub-sampled 100 times. The Pull-and-Grab -based algorithm ranked 1^{st }80% of the time, 2^{nd }or better 94% of the time, and 3^{rd }or better 99% of the time. The similar ranking vector for linear regression was 17%, 70% and 95%, for interpolation 3%, 28%, and 50%. The Garki-based algorithm ranked in the top three only 5% of the time.

The algorithms were developed and evaluated for standardizing PfPR from any age range to any other; it is possible that they have different skill at standardizing to the PfPR in 2–10 year olds. To further evaluate the algorithm for standardization specifically to the 2–10 year old age classes, 87 datasets were identified in which the age limits were between two and 10 (Figure 6). The PfPR pairs were, again, used as a standard for comparison. When PR_{1 }was used instead of its paired PfPR estimate, PR_{2}, the categorical description was wrong in 38% of the cases; virtually all of the hyperendemic populations were misclassified as mesoendemic, and many mesoendemic areas were misclassified as hypoendemic. By way of contrast, the standardized PfPR gave the wrong categorical description in 18% of the cases. Standardization reduced the number of mesoendemic populations that would have been classified as hypoendemic, but it also misclassified two populations as mesoendemic that were actually hypoendemic. Standardization correctly classified most, but not all, of the misclassified hyperendemic populations, but two mesoendemic populations were misclassified as hyperendemic. Given the natural scatter in the estimates, errors are inevitable when continuous data are placed into categories. All other candidate algorithms performed more poorly (Table 1).