Phylogenetic Analysis Reveals a Marked Separation Between Strains Common to NA and E vs. SA. To define the divergence among T. gondii strains, we analyzed the frequency of SNPs within eight introns of five unlinked loci that collectively constituted 3,780 bp per strain [supporting information (SI) Table 1]. We compared exemplars of clonal groups I, II, and III with isolates previously considered "exotic" because of their dissimilarity from the clonal lineages and a select group of strains from SA (central and southern Brazil and French Guyana) (SI Table 2) (7). Neighbor-joining and parsimony analyses of the intron sequence data grouped the 46 different strains into 11 distinct haplogroups (Fig. 1A). The haplogroups occupy strikingly distinct geographic distributions; most strains from groups 1–3 (previously defined lineages I, II, and III) occurred almost exclusively in NA and E, whereas groups 4, 5, and 8–10 occurred primarily in SA, and group 6 was widespread, being found in E, SA, and Africa (3). Very few exceptions to this geographic separation were noted. The genetic makeups of NA and E populations were highly similar [pairwise fixation index (FST) = 0.0524, P 0.05 overall and FST = 0.029, not significant, after removal of rare variants]. By contrast, SA isolates were strongly differentiated from both NA (pairwise FST = 0.211, P FST = 0.185, P same haplogroups predominate in NA and E, and that these differ markedly from those endemic to SA. Strains previously considered "exotic," turn out to be common lineages in SA, and appear unusual only by comparison to the well studied clonal lineages from NA and E.
Although sexual recombination would be expected to reassort allelic variation into many multilocus genotypes, the observed repertoire in NA and E is restricted to several lineages characterized by strong genome-wide linkage among physically unlinked alleles (SI Fig. 5). Certain multilocus genotypes endemic to SA were also repeatedly sampled, providing evidence that they also propagate clonally (Fig. 1A and SI Fig. 5). To examine linkage disequilibrium (LD) within and between loci, we joined the intron sequences end to end and analyzed pairwise associations of polymorphism across these sequences. LD was estimated by using the Zns statistic, which expresses the average correlation among alleles in all pairs of polymorphic sites (13). Zns was elevated not only for the clonal haplogroups 1, 2, and 3 from NA and E but also for SA groups 8, 9 (P SI Table 3). Other SA isolates (i.e., group 4) exhibited lower values of Zns (SI Table 3), as would be expected in lineages experiencing more frequent sexual recombination. A more conventional measure of LD, based on the D' statistic, which expresses the difference between actual and expected dinucleotide haplotype frequencies under the null hypothesis of random mating, supported similar conclusions (SI Fig. 5). Remarkably, some lineages (i.e., groups 1, 2, and 3) showed complete LD even between unlinked loci, whereas others (i.e., groups 5, 8, 9, and 10) showed high LD within a given locus, with less association among unlinked loci (SI Fig. 5).
Distinct Biallelic Polymorphisms Have Accumulated in the North and South Over Approximately Equivalent Time Periods. Previous studies have noted that NA and E isolates from type I, II, and III (haplogroups 1, 2, and 3) are mixtures of biallelic polymorphisms that were inherited as large blocks across the genome, indicating they arose from only a few genetic crosses between highly similar parental strains (6). With the notable exception of Chr1a (discussed below), SNPs found in NA and E were fixed in SA isolates. Surprisingly, the converse is also true; SA strains also show striking biallelic haplotypes, yet these polymorphisms occurred at distinct positions from those seen in the NA/E strains (SI Table 4). Collectively, these patterns indicate that northern and southern strains have mutually exclusive bialellic polymorphic haplotypes.
The most parsimonious explanation of these data is that northern and southern strains have a common origin but have accumulated separate characteristic mutations during an extended period of isolation. To test this model, we estimated the most recent common ancestry (MRCA) between strains from NA and E and those from SA based on the frequencies of SNPs in each population. Analysis of the frequency of biallelic polymorphism indicates that strains from both the North and South have a similar predicted MRCA of 106 yr (SI Table 5) (Fig. 1B). The exception to this pattern was the strain COUG, which has a MRCA with other strains of close to 107 yr, indicating it diverged before the North–South split.
Recognizing that polymorphisms in clonal lineages are derived from a limited number of parental lineages, we then investigated the duration over which "new mutations" (i.e., those not already present in parental strains) have accumulated. Thus, we performed additional analyses excluding those polymorphisms exhibiting typical biallelic SNPs (but which, instead, appear sporadically in particular isolates). In both NA and SA, such clonal haplogroups coalesce 10,000 yr ago (SI Table 5) (Fig. 1B). Collectively, these patterns indicate that northern and southern populations diverged 1 Mya, and that much more recently, a small number of clonal groups have rapidly expanded within the past 10,000 yr.
Mixing of Four Ancestral Groups Can Explain the Current Population Structure. A Bayesian statistical approach was used to infer population structure from allelic variation in the intron sequences by using STRUCTURE (14). Because T. gondii appears to be composed of relatively few genotypes that have historically undergone limited but important admixture (6–8), we explored these data using a linkage model. The model most compatible with the current assemblage of strains suggests that they were derived from admixture of four ancestral lineages (SI Fig. 6). Eleven extant groups, corresponding closely to the haplogroups identified by phylogenic analysis (Fig. 1A), were identified using STRUCTURE (Fig. 2A). Each can be derived by limited admixture of the four inferred ancestral lineages, which most closely correspond to haplogroups 2, 4, 6, and 9 (Fig. 2B).
Most Strains of T. gondii Share a Monomorphic Chr1a. A recent comparison of whole-genome sequence of Chr1a and Chr1b from members of the three clonal lineages (1, 2, and 3) revealed they share a monomorphic version of Chr1a (8) (Mono-ChrIa). To determine how widespread this pattern might be, we sequenced 12 blocks scattered across Chr1a from 30 representative strains; these analyses revealed remarkably few polymorphisms for the majority of strains (Fig. 3A). Members of groups 4 (except strain CASTELLS), 7, 8, and 9 all contained a nearly identical Mono-Chr1a (Fig. 3A). Exceptions to this pattern included groups 5 and 10, which often contained a separate shared allele that differed from the Mono-ChrIa version by conserved biallelic polymorphisms ("Alternative"; Fig. 3A). Additionally, groups 5 and 10 also contained regions that differed substantially among each isolate ("Divergent"; Fig. 3A). Groups 6 and 11 had chimeric versions of ChrIa, in which approximately half of the chromosome was identical to Mono-Chr1a (Fig. 3A).
A neighbor-joining tree reconstructed from variation in the sequenced blocks depicts the divergence among strains based on Chr1a (SI Fig. 7). With the sole exception of COUG, strains in NA and E are characterized by Mono-Chr1a, as are isolates belonging to the SA haplotypes 4, 6b, 8, and 9. Notably, most groups characterized by Mono-ChrIa were also markedly clonal when analyzed at other loci in the genome (see Fig. 1 A and B). In contrast, groups 5, 10, and CASTELLS contained highly divergent versions of ChrIa (SI Fig. 7).
Inheritance of the Apicoplast Supports a Simple Recent Ancestry of Strains. Apicomplexans contain a 35-kb circular genome that is the remnant of a secondary endosymbiont: this organellar genome is inherited maternally (14) and therefore does not undergo genetic recombination. Coalescent analysis of polymorphisms in three regions of apicoplast genome sampled from 35 representative strains was used to infer the ancestral origin by using statistical parsimony (15). Network analysis of the apicoplast haplotypes showed a striking correlation between the Mono-ChrIa and inheritance from just a few matrilineages (Fig. 3B) (SI Table 6). Haplogroups 1, 2, 4, and 8 were derived from a single common matrilineage, whereas groups 3, 6, and 9 were descendant from a second distinct matrilineage (Fig. 3B). In contrast, strains that lacked Mono-ChrIa were spread across divergent nodes of the network (Fig. 3B). These results indicate that Mono-ChrIa may have arisen in a single genetic background and subsequently spread to the majority of haplogroups through very few genetic crosses.
Acute Virulence and Oral Transmission. The extremely widespread success of a small number of T. gondii lineages suggests they have a strong selective advantage. Previous studies suggested that improved transmissibility via oral transmission of tissue cysts between intermediate hosts may explain the predominance of types I, II, and III lineages (1, 2, and 3 here) (7). This trait favors clonal dissemination via carnivorous or omnivorous feeding between intermediate hosts. Additionally, recent studies have mapped genes responsible for acute virulence in the type I lineage in mice, and this trait may also constitute a selective advantage (16). We were therefore curious to understand how broadly such traits were distributed among strains representing greater geographic and genetic diversity.
Representative strains from each haplogroup (SI Table 2) were tested for oral transmissibility by feeding to mice tissue cysts that had developed in the brains of chronically infected mice, as defined previously (7). Efficient oral transmission proved to be a widespread trait in strains from all SA haplogroups, similar to the previously described clonal strains from NA and E (SI Table 2) (Fig. 1B). Rare exceptions occurred in specific strains from haplogroups 4, 6a, and 7 (* in Fig. 1B). In addition to confirming the lack of oral transmissibility of tissue cysts from several previously studied strains [i.e., CAST (group 7) and MAS (group 4)] (7), two additional isolates were thus identified [i.e., FOU (group 6a) and GPHT (group 6a)]. However, highly similar strains from groups 4, 6a, and 7 readily caused infection in mice fed tissue cysts (Fig. 1B) (SI Table 2). Our studies demonstrate that T. gondii tissue cysts of various ancestries (and not just those that have experienced clonal expansion) are infectious to mice upon oral ingestion, although mutation or genetic recombination may occasionally compromise tissue cyst development (17).
Type I strains are extremely virulent, and the effective LD100 is one viable organism (infection always leads to death) in outbred mice, whereas types II and III are relatively nonvirulent (4). We tested the acute virulence of select strains representing different haplogroups. Acute virulence, comparable to that of type I lineages, was widespread among SA haplogroups, including groups 4, 5, 6b, 8, and 9 (SI Table 2) (Fig. 1B). Given the ancient divergence of southern from northern strains, it appears, that acute virulence in mice is an ancestral trait, and it does not uniquely characterize recently evolved, clonally expanding T. gondii lineages.