#### Experimental approach and data

This analysis has the distinctive purpose of developing an algorithm to age-standardize PfPR data or, in other words, the purpose is to find a function, *F*, that transforms an estimate of PfPR over any age range *[x,y)*, into a PfPR over a standard age range *[L,U)*, i.e. *F: PR[x,y)*→*PR[L,U)*. This principle was achieved for a number of candidate statistical and biological deterministic models developed from 21 studies that report very detailed PfPR by age (the training set). The skill of the models was then evaluated by inter-conversion of 121 pairs of crude PfPR estimates, where both were taken from the same population but aggregated over different age ranges (the testing set).

The data selection criteria and summary details are given in Additional File 1 and Additional File 2 for the training and testing sets, respectively. Briefly, the data spanned all potential *P. falciparum *transmission intensity categories [18]. The algorithms developed from the training set were validated against the 121 pairs of the testing dataset by comparing the conversions from one age group to the other, usually from adults to children, and *vice versa*. A selection was made of the best performing algorithm using the goodness of fit, measured as the proportion of the variance explained, and defined as the ratio of the variance in residual error divided by the error in the observed PfPR subtracted from one.

The algorithms all differ from each other in significant ways. The linear regression algorithm was based only on the testing set. The other algorithms were based on only the training set, and did not use the testing set for their development. The linear regression algorithm does not predict a relationship between age and PfPR. The interpolation algorithm predicts PfPR in the testing set based on direct interpolation of the training set data; it does not predict an age-PfPR relationship. The two parametric algorithms were fitted to the training set, but they were used only for prediction on the testing set, so parsimony is not an important concern. The most important measure of performance is their skill – a more complicated model that did better at standardizing PfPR would be preferred regardless of the level of complexity, as long as the algorithm had not been fitted to the testing set. Since the linear regression algorithm is a statistical description of the testing set, it sets a standard for judging the skill of the other algorithms. The predictions based on linear regression of the testing set, on the other hand, do raise important issues of parsimony. Because the statistical analysis was one of the candidate algorithms, the analysis was repeated on a sub-sample of the testing set: two-thirds of the data pairs were used for fitting and one-third for validation. Based on these rules, one algorithm was selected as the most appropriate evidence-based method to age-standardize PfPR for future comparisons between studies reporting parasite prevalence across different age ranges.

#### Statistical candidate algorithms

#### Linear regression

Linear regression was used to describe the relationship between the pairs of PfPR estimates from the testing set. It differs from the other algorithms in that it is based on a statistical analysis of the testing set; it does not predict a relationship between PfPR and age, and it does not rely on the training set. Let PR_{1 }and PR_{2 }denote the two estimates made on the intervals [L_{1}, U_{1}) and [L_{2}, U_{2}), respectively. The full regression included the age limits: (i.e. PR_{1 }= c_{0 }PR_{2 }+ c_{1 }L_{1 }+ c_{2 }U_{1 }+ c_{3 }L_{2 }+ c_{4 }U_{2 }+ ε). This formula sometimes returns values for predicted PR_{1 }that are outside the interval (0,1), so the analysis was repeated as non-linear regression where the points outside this range were forced to be either 0 or 1. The regression analysis included each pair twice; each member was both PR_{1 }once and PR_{2 }once. Linear regression of the full testing set provided a standard for the other algorithms, and the predicted values from the linear regression were also evaluated as a potential algorithm.

#### Interpolation

A general class of interpolation algorithms was based on the training set: (i) given the PfPR from a focal study, *PR*_{f}*[x,y)*, compute the PfPR for the training sets over the same interval, *PR*_{i}*[x,y), i = 1...21*; (ii) let *W*_{i }= | *PR*_{f}*[x,y) - PR*_{i}*[x,y) *| ^{-z}, and let *ω*_{i }= *W*_{i}/∑_{i}*W*_{i }denote the weight of the i^{th }PfPR estimate and (iii) the standardized PfPR is then ∑_{i}*ω*_{i}*PR*_{i }*[L,U)*. Linear interpolation was also considered and was very similar to the general interpolation algorithm with *z = *6.

#### Pull & Grab-based Algorithm

Let *P(A) *denote the true prevalence and *F(A) *the sensitivity of the microscopy, a standard method for estimating PfPR, as a function of age. The function *F(A) *was motivated by the notion that sensitivity declines with age as blood-stage immunity reduces parasite densities to a point where they are often below the detection thresholds of microscopy [26,27]. Therefore, the apparent PfPR, the one that a study would find using microscopy, is *p(A) = P(A) F(A)*. Crude PfPR estimates also depend on the age-distribution of the sampled population, *S(A)*; when PfPR is reported crudely, the age-distribution is generally not reported, so it must also be inferred. Given *p(A) *and *S(A)*, an estimate of crude PfPR would be *PR*_{f}*[x,y) *= ∫_{x}^{y}*p(A) S(A) dA*/∫_{x}^{y}*S(A) dA*. The standardized estimate would be *PR*_{f}*[L,U) *= ∫_{L}^{U}*p(A) S(A) dA*/∫_{L}^{U}*S(A) dA*.

The estimate of *S(A) *was based on the age-distributions in the training set (Figure 1). Each of the 21 studies reports the number of people sampled by age or by age class. Typically, ages were binned by year through childhood, then by 5-year age classes up to age 65, and finally the elderly, which was arbitrarily truncated at age 85. When a study binned several age classes, the observations for each year were proportional to the total observed for that class. For example, the sample for 27 year olds was counted as 1/5 of the total observations for 25–30 year olds. In sum, to get an estimate of *S(A)*: (i) for each study, the proportion of all people sampled that belonged to each one-year age class was computed and (ii) the proportion for each age class was the average for all the studies, where each study was weighted equally.

To generate p(A), the curves P(A) and F(A) were generated separately. P(A) was based on Pull & Grab's equations [28], which were motivated by the Ross model [29] and previous work by Muench [30]. The change in PfPR with age is described by the equation:

*dP/dA = h (1-P) - r P;*

where *h *denotes the force of infection (i.e., the "happenings" rate), and *r *is the rate at which infections clear. When *P(0) = 1*, this equation has the solution:

*P(A) = P' (1 - e*^{-bA }*)*.

Here, *P' = h/(h+r) *is the PfPR at equilibrium, and *b = h+r *describes the rate at which PfPR approaches equilibrium.

A three-parameter function was used for *F(A)*: 1 - s [1 - min (1, e^{-c(A-α)}) ]. Beginning at age a, this function declines from 1, to 1-s; the parameter c describes the decline from 1 to 1-s as a function of age. Conceptually, this function can be thought of as the decline in the probability of detecting an active infection, although the real reason for the decline in PfPR might be that immunity leads to real declines in *h *or real increases in *r*. For the purposes of standardization, the biological reasons for the decline are not relevant.

The modified Pull & Grab model was fitted to all 21 datasets using maximum likelihood estimation (Figure 2). The algorithm used the average estimates of *b*, *α*, *s*, and *c*. The 21 best-fit parameter values for *α*, *b*, *c*, and *s *were uncorrelated with *P' *(a scatter plot of the slope, *b*, is plotted against the plateau, *P'*, Figure 3). A few of the parameter values were clearly extreme, but a careful examination of the extreme values suggested that they were irrelevant (i.e. changing their values by a factor of 10 did not change the shape of the fitted curve because of the other fitted parameters), so the influence of these extreme but irrelevant values was removed by taking the trimmed means. This generated a family of age-PR curves that depended only on *P'*.

The modified Pull & Grab equations each generate a curve that describes PfPR as a function of age for each value of *P'*. To turn this function into a standardization algorithm, the function need only be inverted. In other words, the algorithm found a value *P** such that *PR*_{f}*[x,y) *= ∫_{x}^{y}*p(A|P*) S(A) dA*/∫_{x}^{y}*S(A) dA*. The standardized value was then *PR*_{f}*[L,U) *= ∫_{L}^{U}*p(A|P*) S(A) dA*/∫_{L}^{U}*S(A) dA*.

#### The Garki model

Another parametric method was based on the Garki model [14], using parameter values that were fitted during the Garki project [31]. This model is based on a system of seven coupled difference equations [14]. Here, *P(A) *was computed using an analogue of the Garki equation with seven coupled ordinary differential equations. To convert the parameter values, *-log(1-x)*, was used instead of the parameter *x*. Instead of a fixed delay for the pre-patent periods – the pre-patent period was assumed to be distributed exponentially, but with the same mean time as in the Garki model (i.e. *1/N*). The differences between the ordinary differential equation and difference equations are negligible and are available upon request. Births and deaths were ignored (i.e. δ = 0) because the quantity of interest was prevalence in the survivors; the initial value of x_{1 }was set to 1, and all other variables were initially set to 0. The equations were:

dx_{1}/dA = R_{1 }y_{2 }- h x_{1}

dx_{2}/dA = h x_{1 }- x_{2}/N

dy_{1}/dA = x_{2}/N - α_{1 }y_{1}

dy_{2}/dA = α_{1 }y_{1 }- (α_{2 }+ R_{1}) y_{2}

dx_{3}/dA = R_{2 }y_{3 }- h x_{3}

dx_{4}/dA = h x_{3 }- x_{4}/N

dy_{3}/dA = α_{2 }y_{2 }+ x_{4}/N - R_{2 }y_{3}

The function *p(A) = q*_{1}*y*_{1}*(A) + q*_{2}*y*_{2}*(A) + q*_{3}*y*_{3}*(A)*, where *y*_{i}*(A) *were found by choosing a value for *h *and then numerically solving the system of equations. The algorithm used the age distribution from the training set, *S(A)*. All other parameters except *h *were taken from the original paper (note that the values of R_{1 }and R_{2 }are fixed by *h *and other parameters). To turn this function into a standardization algorithm, the function again, need only be inverted. In other words, the algorithm found a value *h* *such that *PR*_{f}*[x,y) *= ∫_{x}^{y}*p(A|h*) S(A) dA*/∫_{x}^{y}*S(A) dA*. The standardized value was then *PR*_{f}*[L,U) *= ∫_{L}^{U}*p(A|h*) S(A) dA*/∫_{L}^{U}*S(A) dA*.

All the routines described were written in R [32], and are available upon request.