In our comparison of the Atropa and Nicotiana plastid genomes, the most variable regions that tentatively met the barcode criteria were nine intergenic spacers: trnK-rps16, trnH-psbA, rp136-rps8, atpB-rbcL, ycf6-psbM, trnV-atpE, trnC-ycf6, psbM-trnD, and trnL-F (listed in order of decreasing variability; Table 1 and Fig. 1). By comparison, ITS had a much higher divergence value (13.6%) than any of the plastid regions, and rbcL was by far the lowest in divergence (0.83%). Although three spacers (atpB-rbcL, ycf6-psbM, and psbM-trnD) were slightly to moderately longer than our 800-bp cutoff, we included them in our further analysis because of their high interspecific variability.
The results of our intrageneric tests across eight genera in the first taxon set demonstrated conspicuous differences between the nine plastid regions with respect to our three barcoding criteria: amplification success, sequence length, and sequence divergence. Only three regions (trnH-psbA, rp136-rpf8, and trnL-F) were successfully amplified for all eight genera and 19 species; the other regions, including ITS, could not be amplified in one or more taxa (Table 2). Sequence length in the nine plastid regions ranged from 204 to 1,240 bp, with mean length in all but two (ycf6-psbM and psbM-trnD) falling within our 300- to 800-bp optimum length criterion (Table 2). ITS had the highest between-species sequence divergence values in four of the five genera successfully amplified (Table 2), with a mean sequence divergence of 2.81% across the five genera. trnH-psbA ranked first in divergence value in six of the eight genera and in 11 of the 14 species pairs, compared with the other eight plastid regions; trnV-atpE and trnC-ycf6 ranked highest for the remaining two genera and three species pairs (Table 2). trnH-psbA ranked highest (1.24%) in mean percent sequence divergence across all genera, whereas trnV-atpE (0.29%) and ycf6-psbM (0.30%) ranked lowest (Table 2).
In our broader taxonomic sampling of the Plummers Island flora in which only herbarium material was used, none of the loci could be successfully amplified for all of the 83 species tested, which we suggest may be related to primer design or to more fundamental changes in gene structure during herbarium specimen preparation and storage (see ref. 33). Amplification success was highest for trnH-psbA (100%), followed by rbcL (5' half; 95%), and ITS (88%, although high-quality sequence data were not obtained from all ITS amplifications). We could not detect any general correlation between specimen age and amplification success, indicating that herbarium specimens in apparently good condition and as old as 20 years can be successfully used to establish DNA-sequence reference libraries. Moreover, amplification of full-length ITS was possible (results not shown) for the five specimens of Erysimum cheiranthoides collected between 1897 and 1997 (Fig. 2), indicating that significantly older specimens also may be used.
Because of the high sequence divergence value in the majority of genera in our taxon set one and the high amplification success of the trnH-psbA spacer in all of our test samples, this region became the focus of our examination of the plastid genome for further analyses of barcoding potential. The trnH-psbA amplicon ranged from 247 to 1,221 bp, whereas the intergenic spacer alone (excluding primer-binding regions and small regions of flanking exon) ranged from 119 to 1,094 bp across 53 families of flowering plants, including both the Plummers Island species and the taxonomic groups (extremes were Thalictrum and Trillium, respectively; see Table 2 and Table 5, which is published as supporting information on the PNAS web site). Most taxa (92%) had amplicons falling between 340 and 660 bp, which is within our suggested length criterion for successful barcoding. All species in our sampling had unique trnH-psbA spacer sequences, which is very relevant to the question of using this gene for barcoding plants.