Determining Suitable Regions of the Genome. To screen for appropriate levels of sequence divergence in the plastid genome, we chose two closely related flowering plant species for comparison, Atropa belladonna and Nicotiana tabacum (Solanaceae). Both species have complete sequence data available for their plastid genomes (35-37). Twenty-nine additional complete plastid genomes spread across a wide range of plant groups are also available for comparison: algae (five genera in various families), mosses and liverworts (three genera in different families), ferns and relatives (three genera in different families), gymnosperms (two species in the genus Pinus), and angiosperms (eight genera in eight different families, two genera in the Fabaceae, and four genera and several cultivars in the Poaceae). We selected Nicotiana and Atropa, even though they belong to different subfamilies of Solanaceae (38), because they represent the most closely related taxa among the genomes available in the angiosperms. The complete plastid genomes of the taxa in the Fabaceae and the Poaceae include cultivars, hybrids, and more distantly related genera. We aligned the Nicotiana and Atropa genomes, and raw divergence levels (i.e., number of base-pair discordances divided by length of sequence under consideration) were individually estimated across all genes, introns, and intergenic spacers (Fig. 1). Plastid regions with raw sequence differences
2% (Table 1) were categorized as the most variable segments, and therefore the most promising of the plastid genome for DNA barcoding when normalized for length. The nuclear ITS region and plastid rbcL gene were used as baseline comparisons for these chloroplast test regions (Table 1). To further narrow down the number of remaining regions usable for barcoding purposes, we applied a sequence criterion of 300-800 bp and a stable presence across multiple plastid genomes of both monocots and dicots.
Selecting Taxa for Testing. To empirically test the regions identified as most appropriate for barcoding in our comparison of the plastid genomes of Atropa and Nicotiana (Table 1), we selected two sets of flowering plant taxa. The first taxon set consisted of 2 or 3 species in each of eight genera spread across seven families of plants for a total of 19 species (Table 2 and Table 3, which is published as supporting information on the PNAS web site). The second taxon set included a geographically circumscribed flora comprised of taxa that are not closely related but represent a broad range of angiosperms in 50 plant families, including 83 species in 72 genera (Table 3). The selection of the two taxon sets was made so as to test each locus for appropriate sequence length and divergence, primer success across a wide taxonomic spectrum, and the viability of routinely extracting DNA from dried herbarium specimens, compared with fresh or silica-dried tissue. The species in the first taxon set were selected because they represent a diverse set of species pairs across the angiosperms (including monocots and dicots) with various levels of phylogenetic distance as previously shown in research by the authors using other genetic markers (W.J.K. and K.J.W., unpublished data). In addition, high-quality DNA extractions from living plants, silica-dried tissue, and/or herbarium specimens were readily available for these taxa. The genera were not selected randomly and were not biased a priori toward low or high levels of interspecific divergence. The second taxon set was selected to represent a floristic sample that would be used in a typical plant DNA barcoding project. The samples were taken from Plummers Island, MD, a National Park Service habitat reserve in the Potomac River that has been studied and inventoried by biologists in the Washington, DC, area for >100 years, making it an appropriate test site for barcoding trials. For the Plummers Island taxa, tissue samples were taken from dried leaves only on herbarium specimens located in the U.S. National Herbarium (Smithsonian Institution) collected between 1960 and 2000 (Table 3). These samples were used to compare ITS and rbcL as standards to the best plastid regions identified in the tests of taxon set one. A smaller set of older herbarium collections of Erysimum cheiranthoides (Brassicaceae) prepared as early as 1897 were compared with more recent collections made as recently as 1997 from the same populations to empirically test the relationship between specimen preservation status, age, and DNA quality (see Fig. 2, which is published as supporting information on the PNAS web site).
DNA Analysis. New DNA extractions were performed with the DNeasy Plant Mini kit (Qiagen, Valencia, CA) after tissue disruption of 0.5-1 cm2 of leaf tissue in a FastPrep FP-120 bead mill (Qbiogene, Carlsbad, CA). DNA extractions followed manufacturer's protocols with the modification of buffer AP1 lysis conditions by the addition of 0.4 mg of proteinase, 15 mg of DTT, and incubation at 42°C for 12 h on a rocking platform. This method can easily be scaled up to a 96-well format for large-scale (high-throughput) barcoding purposes. Amplification by PCR used puReTaq Ready-To-Go PCR beads (Amersham Pharmacia Biosciences) and direct sequencing of purified PCR products used BIGDYE 3.1 software on a 3100 sequencer, both from Applied Biosystems. Universal primers for selected genes and intergenic spacers were taken from investigations described in refs. 39-41 and Table 4, which is published as supporting information on the PNAS web site. Comparative rbcL data were generated for the Plummers Island flora by splitting the gene into two overlapping fragments (1f-724r and 636f-1368r), because test amplifications on a portion of the samples netted only 31% success as a full-length fragment vs. 94% as two pieces.