Taxonomic sampling
Cichlid samples were obtained from local animal dealers in Japan. We
combined these new mitogenomic data with 48 previously published
sequences from the DDBJ/EMBL/GenBank nucleotide sequence database. The
10 cichlid taxa that we analyzed (Table 1)
cover species from major Gondwana-origin landmasses. In addition, we
chose 31 other teleosts, nine basal actinopterygians, and two
sarcopterygians. Two sharks were sampled as an outgroup to root the
tree. Additional file 1 contains a complete list of the sampled taxa, along with the database accession numbers of their mitogenomic sequences.
Additional File 1. List of species used, with database accession numbers. Classifications follow Nelson [11].
Format: DOC
Size: 54KB Download file
Table 1. Cichlid taxa analyzed for mtDNAs
DNA extraction, PCR, and sequencing
Fish samples were excised from live or dead specimens of each
species and immediately preserved in 99.5% ethanol. Total genomic DNA
was extracted from muscle, liver, and/or fin clips using a DNeasy
tissue kit (Qiagen) or a DNAzol Reagent (Invitrogen), following
manufacturer protocols. The mtDNA of each species was amplified using a
long-PCR technique with LA-Taq (Takara). Seven fish-versatile primers
for long PCR (S-LA-16S-L, L2508-16S, L12321-Leu, H12293-Leu,
H15149-CYB, H1065-12S, and S-LA-16S-H [21-26])
and the two cichlid-specific primers cichlid-LA-16SH
(5'-TTGCGCTACCTTTGCACGGTCAAAATACCG-3') and cichlid-LA-16SL
(5'-CGGAGTAATCCAGGTCAGTTTCTATCTATG-3') were used in various
combinations to amplify regions covering the entire mtDNA in one or two
reactions. The long-PCR products were used as templates for subsequent
short PCR.
Over 100 fish-versatile PCR primers [21-27] and 18 taxon-specific primers (Additional file 2)
were used in various combinations to amplify contiguous, overlapping
segments of the entire mtDNA for each of the six new cichlid species.
The long PCR and subsequent short PCRs were performed as described
previously [21,28]. The short-PCR reactions were performed using the GeneAmp PCR System 9700 (Applied Biosystems) and Ex Taq DNA polymerase (Takara).
Additional File 2. Cichlid-specific primers for PCR and sequencing. H and L indicate the orientation of the primers. The locations of the primers are shown with the names of the targeted genes.
Format: DOC
Size: 44KB Download file
Double-stranded PCR products, treated with ExoSAP-IT
(USB) to inactivate remaining primers and dNTPs, were directly used for
the cycle sequencing reaction, using dye-labeled terminators (Applied
Biosystems) with amplification primers and appropriate internal
primers. Labeled fragments were analyzed on Model 3100 and Model 377
DNA sequencers (Applied Biosystems).
Sequence manipulation
The DNA sequences obtained were edited and analyzed using EditView
1.0.1, AutoAssembler 2.1 (Applied Biosystems) and DNASIS 3.2 (Hitachi
Software Engineering Co. Ltd.). Individual gene sequences were
identified and aligned with their counterparts in 48 previously
published mitogenomes. Amino acid sequences were used to align
protein-coding genes, and standard secondary structure models for
vertebrate mitochondrial tRNAs [29] were consulted for the alignment of tRNA genes. The 12S and 16S rRNA sequences were initially aligned using clustalX v. 1.83 [30] with default gap penalties and subsequently adjusted by eye using MacClade 4.08 [31].
The ND6 gene was excluded from the phylogenetic analyses because of
its heterogeneous base composition and consistently poor phylogenetic
performance [22].
The control region was also excluded because positional homology was
not confidently established among such distantly-related species. The
third codon positions of protein genes were excluded because of their
extremely accelerated rates of change that may cause high levels of
homoplasy. After the exclusion of unalignable parts in the loop regions
of tRNA genes, as well as the 5' and/or 3' end regions of protein
genes, all gene sequences were concatenated to produce 10,034-bp sites
(6962, 1402, and 1670 positions for protein-coding, tRNA, and rRNA
genes, respectively) for phylogenetic analyses.
Phylogenetic analyses
Phylogenetic trees were reconstructed using partitioned Bayesian and
maximum likelihood analyses. Partitioned Bayesian phylogenetic analyses
were performed using MrBayes 3.1.2 [32].
We set four partitions (first codon, second codon, tRNA, and rRNA
positions). The general time-reversible model, with some sites assumed
to be invariable and variable sites assumed to follow a discrete gamma
distribution (GTR + I + Γ; [33]), was selected as the best-fit model of nucleotide substitution by MrModeltest 2.2 http://www.abc.se/~nylander/ webcite[34].
The Markov chain Monte Carlo (MCMC) process was set so that four chains
(three heated and one cold) ran simultaneously. We ran the program for
3,000,000 metropolis-coupled MCMC generations on each analysis, with
tree sampling every 100 generations and burn-in after 10,000 trees.
Partitioned maximum likelihood (ML) analyses were performed with RAxML ver. 7.0.3 [35],
a program implementing a novel, rapid-hill-climbing algorithm. For each
dataset, a rapid bootstrap analysis and search for the best-scoring ML
tree were conducted in one single program run, with the GTR + I + Γ
nucleotide substitution model. The rapid bootstrap analyses were
conducted with 1000 replications, with four threads running in parallel.
Statistical evaluation of alternative phylogenetic hypotheses was done using TREE- PUZZLE 5.2 [36], using the two-sided Kishino and Hasegawa (KH) [37] test, the Shimodaira and Hasegawa (SH) [38] test, and Bayes factors [39,40]. We used the GTR + I + Γ model and its parameters optimized by MrModeltest 2.2.
Divergence time estimation
For the divergence time estimation, multidistribute program [41]
was used by assuming a topological relationship thus obtained, but
without assuming the molecular clock (i.e., by allowing heterogeneity
in molecular evolutionary rate along branches). Upper and/or lower time
constraints at selected nodes were set for the Bayesian MCMC processes
to estimate divergence times (including means and 95% credibility
ranges) and relative rates at ingroup nodes. We set the partitioning as
described above and first used PAML [42] to optimize the parameters of model F84 and the gamma distribution for eight categories to account for site heterogeneity. Estbranches and multidivtime programs
were then used to estimate divergence times. We used 21 fossil-based
time constraints assignable to diverse teleostean lineages (Table 2).
Table 2. Maximum (U) and minimum (L) time constrains (MYA) used for dating at nodes in Fig. 2