The bioinformatic approach implemented in this study resulted in a
large set (154 loci for the zebrafish and torafugu comparison) of
candidate genes to infer high-level phylogeny of ray-finned fishes. The
actual number of candidate loci depended on the genomes being compared
and the fixed search parameters. Experimental tests of a smaller subset
(15 loci) demonstrate that a large fraction (2/3) of these candidates
are easily amplified by PCR from whole genomic DNA extractions in a
vast diversity of fish taxa. The assumption that these loci are
represented by a single copy in the fish genomes could not be rejected
by the PCR assays in the species tested (all amplifications resulted in
a single product), increasing the likelihood that the genetic markers
are orthologous and suitable to infer organismal phylogeny. Our method
is based on searching, under specific criteria, the available complete
genomic databases of organisms closely related to the taxa of interest.
Therefore, the same approach that is shown to be successful for fishes
could be applied to other groups of organisms for which two or more
complete genome sequences exist. Parameter values (L, S, and C) used
for the search (Figure 2)
may be altered to obtain fragments of different size or with different
levels of conservation (i.e., less conserved for phylogenies of more
closely related organisms).
An alternative way to develop nuclear gene markers for phylogenetic
studies is to construct a cDNA library or sequence several ESTs for a
small pilot group of taxa, and then to design specific PCR primers to
amplify the orthologous gene copy in all the other taxa of interest [19,46].
The major potential problem with this approach stems from the fact that
the method starts with a cDNA library or a set of EST sequences, with
no prior knowledge of how many copies a gene has in each genome. As
discussed above, this condition may lead to mistaken paralogy. In our
approach, we search the genomic database to find single-copy candidates
so no duplicate gene copies, if present, would be missed (see below).
Recent studies have proposed whole genome duplication events during
vertebrate evolution and also genome duplications restricted to
ray-finned fishes [31,32,47,48].
Our results indicate that many single-copy genes still exist in a wide
diversity of fish taxa (representing 28 orders of actinopterygian
fishes), in agreement with previous estimates that a vast majority of
duplicated genes are secondarily lost [34,35]. All 154 candidates were identified as single-copy genes in D. rerio and T. rubripes,
according to our search criteria. Our results also show the 154
candidate genes are randomly distributed in the fish genome (at least
among chromosomes of D. rerio). In the experimental tests, 10
out of 15 markers were found in single-copy condition in all successful
amplifications, including the tetraploid species, O. mykiss.
However, relaxing the search criteria, and conserving targets less than
50% similar in a subsequent blast search against the zebrafish genome,
7 of the 10 genes were found to have "alignable paralogs" (the 3
exceptions were myh6, tbr1, and Gylt). Genomes of medaka, stickleback,
and fugu were also checked for these 3 genes, and no "paralogs" were
detected, suggesting the sequences of ray-finned fish collected for
these 3 genes are unambiguously orthologous to each other. Phylogenetic
analyses for each of the 7 genes that include the putative paralogs
found by this procedure produced tree topologies that strongly suggest
an ancient duplication event in the vertebrate lineage, before the
divergence of tetrapods from ray-finned fishes. Paralogous sequences
are placed at the base of the tetrapod-actinopteryigian divergence, or
as part of a basal polytomy with the other tetrapod and ray-finned fish
sequences. In the terminology proposed by Remm et al. 
these would be considered out-paralogs. In no case are these sequences
nested among ingroup actinopterygian sequences (see Additional file 4), as would be the case expected for in-paralogs .
Stringent search critera implemented in our approach followed by
phylogenetic analysis can distinguish between orthologs and putative
our-paralogs. Although the method will not guarantee that single copy
genes amplified by PCR in several taxa are orthologs as opposed to
in-paralogs, the existence and identification of genome-scale
single-copy nuclear markers should facilitate the construction of the
tree of life, even if the evolutionary mechanism responsible for
maintaining single-copy genes is poorly known .
Additional file 4. ML phylogenies based on
protein sequences of individual genes and their out-paralogs found by
relaxing our search criteria to include fragments with similarity <
Size: 9KB Download file
The molecular evolutionary profiles of the 10 newly
developed markers are in the same range as RAG-1, a widely-used gene
marker in vertebrates. The genes with high treeness values have
intermediate substitution rate, suggesting that optimal rate and base
composition stationarity are important factors that determine the
suitability of a phylogenetic marker. The phylogeny based on individual
markers revealed incongruent phylogenetic signal among 6 of the 10
individual genes. This incongruence suggests that significant biases in
the data might obscure the true phylogenetic signal in some individual
genes, but the direction of the bias is hardly shared among genes
(Additional file 3), justifying the use of genome-scale gene makers to infer organismal phylogeny.
Finally, with respect to the phylogenetic results per se, there are two significant areas of discrepancy between the phylogeny obtained in this study (Figure 3a) and a consensus view of fish phylogeny (Figure 3b) .
Although these differences could be due to poor taxonomic sampling, we
discuss them briefly. First, the traditional tree groups cichlids with
other perciforms, whereas our results showed the cichlid O. niloticus is
more closely related to atherinomorphs (Cyprinodontiformes +
Beloniformes) than to other perciforms. This result also was supported
by two recent studies analysing multiple nuclear genes [17,51]. The second difference is that the traditional tree groups Lycodes with other perciforms, while Lycodes was found closely related to Gasterosteus (Gasterosteiformes) in our results. Interestingly, the sister-taxa relationship between Lycodes and Gasterosteus also is supported by recent studies using mitochondrial genome data [38,52].
The difference between our "total evidence" tree and the classical
hypothesis is significant based on the new data, as indicated by a
one-tailed Shimodaira-Hasegawa (SH) test (p = 0.000) .