Chloroplast Sequence Variability. Sequences of the chloroplast trnG–trnS spacer resulted in 610 aligned bases with 16 substitutions and 13 indels that varied in size from 1 to 219 bp (Table 3, which is published as supporting information on the PNAS web site). Sequence divergence, as measured with Kimura two-parameter algorithm in paup* (62), ranged from 0% to 3.37% for all Spondias samples, and from 0% from 0.86% within S. purpurea. Sequences of the trnG–trnS spacer in Spondias conform to the expectation of neutrality by Tajima's criterion (D = –1.32250, P > 0.10) and Fu and Li's D* (D = –2.41966, P > 0.05) and F* = –2.36326, P > 0.05) criteria. Thirty unique haplotypes were identified from the trnG–trnS sequences. The number of individuals carrying a given haplotype varied from 1 to 60. Pairwise Fst values ranged from 0 to 0.66379. Tests for isolation by distance based on Nm and Fst were not significant (popgene, version 1.31, ref. 63).
Ancestors of Cultivated S. purpurea Based on the Distribution of trnG–trnS Haplotypes. The 30 trnG–trnS haplotypes recovered in this study fell into two distinct groups (groups 1 and 2, Fig. 2A). Haplotypes in the two groups differed by at least 10 mutations. The 13 haplotypes found in group 1 were carried exclusively by S. mombin, S. radlkoferi, and S. testudinus trees from Central and South America; none of the ≥100 S. purpurea phenotypes surveyed carried any of the alleles of group 1. All individuals identified as either wild or cultivated S. purpurea carried one group 2 haplotype (Fig. 2 A), reflecting the close relationship of wild and cultivated S. purpurea trees. In addition, four of the group 2 haplotypes (haplotypes AA, AC, R, and Z) were recovered in five Mesoamerican trees identified as either S. mombin var. mombin or S. radlkoferi. The S. mombin var. mombin and S. radlkoferi trees carrying group 2 haplotypes occurred in southern Central America (southwestern Nicaragua and northwestern Costa Rica). Shared haplotypes among species can be attributed to secondary gene flow (hybridization) or incomplete lineage sorting, where branching events in gene geneaologies do not correspond to branching in population history (57). Based on the trnG–trnS sequence data, Mesoamerican S. purpurea trees either share a common ancestor with other Spondias taxa in southern Central America, are experiencing ongoing gene flow with sympatric congeners in this region, or both.
Haplotype Distribution in Cultivated and Wild Mesoamerican Jocotes. Of the 17 haplotypes detected in S. purpurea trees, 12 (71% of the total allele diversity) were recovered in wild populations, and nine (53% of the total allele diversity) were carried by cultivated individuals. During the course of jocote domestication, both the number of trnG–trnS alleles and the relative abundance of those alleles have changed under the influence of human selection (Fig. 3).
Four alleles (24% of total allele diversity) were found in both wild and cultivated populations. Three of the four shared alleles (R, V, and Z) were recovered sporadically in wild populations, living fences, and backyard gardens in a region spanning from southern Mexico to Panama. Allele AC was found in wild and cultivated populations throughout Mesoamerica. This mosaic-like geographic pattern of shared alleles in cultivated and wild populations is consistent with the idea that genetically distinct individuals from different geographic regions were taken into cultivation and subsequently distributed by humans.
In most cases, the variation detected in cultivated populations is a subset of the total haplotype diversity recorded for a species (10, 64–66). In S. purpurea, five haplotypes were found in cultivated populations but were not detected in wild populations. The presence of unique haplotypes in agricultural habitats may be the result of incomplete sampling of wild populations, or it may be the result of new alleles that have arisen in cultivation. Alternatively, it may reflect contemporary extinction of the tropical dry forests of Mexico and Central America and, consequently, the extinction of alleles carried by S. purpurea trees in these forests. Four of the five unique alleles were recovered from informal agricultural habitats. These data provide evidence for previous claims that traditional agricultural habitats may be acting as important reservoirs of genetic variation (67–69).
The trnG–trnS Haplotype Network. The haplotypes recovered in S. purpurea trees were organized into haplotype networks based on their mutational differences (Fig. 2B). Two most parsimonious networks were identified. Both contain three homoplasious sites and are two steps shorter than the next most parsimonious network. Analyses were conducted with both networks, one of which is shown in Figs. 2B and 4. The other most parsimonious network differed in the placement of the clade that included haplotypes AA, AB, S, Q, Y, and Z. In the alternative network, this clade was attached to haplotype O instead of haplotype R. The placement of this clade does not affect overall conclusions of this paper. The distribution of the trnG–trnS alleles (Fig. 2B) conform to the predictions of coalescent theory (70).
The trnG–trnS spacer had 13 indels, each of which was coded as a single binary character following the six rules described in ref. 71 (see also refs. 72 and 73) (Tables 3 and 4, which are published as supporting information on the PNAS web site). One of the gaps (gap six) was mapped on the haplotype network twice (indicated with an asterisk on network, Fig. 2B). Of particular interest is the region from position 169 to 229, a series of three adjacent sequences (19–21 bp in length). The string of gaps created by the lacking nucleotides in this region were labeled 4, 5, and 6 (Table 4). Alleles recovered from wild and cultivated S. purpurea individuals contained nucleotides in some or all of the regions corresponding to gaps 4, 5, and 6, whereas those carried by S. mombin and S. radlkoferi individuals lacked nucleotides in this region. All alleles common in S. purpurea contained nucleotides at 169–188 (gap 4), and all but three had an insertion in gap 6 (210–228). The three alleles lacking sequence in gap 6 (O, P, and T) were carried exclusively by S. purpurea trees from wild populations in the states of Jalisco, Michoacan, and Nayarit in western central Mexico. In addition to their restricted geographic distribution and their absence from cultivated S. purpurea trees, the status of alleles O, P, and T as putatively primitive within S. purpurea is further substantiated by their interior status in the trnG–trnS haplotype network (see below for detailed discussion).
Distribution of Haplotypes in S. purpurea Populations. Ancestral haplotypes. Coalescent theory predicts that older alleles will occupy interior positions in the haplotype network (70). In the trnG–trnS network, haplotypes R, O, and P are the most interior haplotypes (Fig. 2B). These haplotypes were recovered from wild S. purpurea populations in western Central Mexico (Jalisco, Michoacan, and Nayarit; alleles O and P) and from wild populations in Guatemala and El Salvador (allele R) (Fig. 4A).
Multiple origins of cultivated S. purpurea. The trnG–trnS haplotype network reveals two groups of S. purpurea haplotypes, one centered in western Central Mexico and the other spanning from southern Mexico through Central America (Fig. 4 A and B). The first group comprises alleles recovered from wild and cultivated populations in southern Mexico and Central America (alleles Q, R, S, V, W, X, Y, Z, AA, and AB). The second group includes seven alleles, four of which were recovered exclusively in the wild populations of western Central Mexico (O, P, T, and AF). AC was the most common haplotype; it was detected in wild populations of western Central Mexico, as well as in cultivated and wild populations throughout Mexico and Central America. Alleles AD and AE were recovered from a backyard tree in Nicaragua and living fence in Guatemala, respectively.
Geographical Structuring. A nested clade analysis (NCA) was used to test for statistically significant associations between clades and geographical locations (refs. 59 and 60; but see ref. 74) (Fig. 4C). The NCA rejected the null hypothesis of no association between geographical location and clades for clades 1-6, 2-3, 3-1, and 3-2 (Fig. 4C and Table 2). Clade 1-6 included alleles in cultivated populations from El Salvador, Costa Rica, and Panama, and wild populations from southern Mexico (Chiapas) to northern Costa Rica. The null hypothesis was rejected for the next level of nesting as well (clade 3-1), which includes alleles found in cultivated and wild populations in southern Mexico and Central America. In addition, statistically significant results were obtained for clades 2-3 and 3-2, which include alleles found exclusively in wild populations in western Central Mexico (alleles AF, T, O, and P), a widespread allele found in cultivated and wild populations throughout the region (AC), and two singletons from Central America (AD and AE). The NCA provides statistical support for two distinct groups of S. purpurea haplotypes, corroborating the inference that S. purpurea was domesticated more than once in Mesoamerica