table of contents table of contents

A genome-comparison strategy to identifying nuclear gene markers for phylogenetic inference and …

Home » Biology Articles » Zoology » Ichthyology » A practical approach to phylogenomics: the phylogeny of ray-finned fish (Actinopterygii) as a case study » Methods

- A practical approach to phylogenomics: the phylogeny of ray-finned fish (Actinopterygii) as a case study

Genome-scale mining for phylogenetic markers

Whole genomic sequences of Danio rerio and Takifugu rubripes were retrieved from the ENSEMBL database [54]. Exon sequences with length > 800 bp were then extracted from the genome databases. The exons extracted were compared in two steps: (1) within-genome sequence comparisons and (2) between genome comparisons. The first step is designed to generate a set of single-copy nuclear gene exons (length > 800 bp) within each genome, whereas the second step should identify single-copy, putatively orthologous exons between D. rerio and T. rubripes (Figure 2). The BLAST algorithm was used for sequence similarity comparison. In addition to the parameters available in the BLAST program, we applied another parameter, coverage (C), to identify global sequence similarity between exons. The coverage was defined as the ratio of total length of locally aligned sequences over the length of query sequence. The similarity (S) was set to S < 50% for within-genome comparison, which means that only genes that have no counterpart more than 50% similar to themselves were kept. The similarity was set to S × > 70% and the coverage was set to C > 30% in cross-genome comparison, which selected genes that are 70% similar and 30% aligned between D. rerio and T. rubripes. Subsequent comparisons were performed on the newly available genome of stickleback (Gasterosteus aculeatus) and Japanese rice fish (Oryzias latipes), as described above. We programmed this procedure using PERL programming language to automate the processes and made the source code publicly available on our website [43]. We are in progress to make it available for other genomic sequences and parameter values.

Experimental testing for candidate markers

PCR and sequencing primers were designed on aligned sequences of D. rerio and T. rubripes for 15 random selected genes. Primer3 was used to design the primers [55]. Degenerate primers and a nested-PCR design were used to assure the amplification for each gene in most of the taxa. Ten of the 15 genes tested were amplified with single fragment in most of the 36 taxa examined. PCR primers for 10 gene markers are listed in Table 1. The amplified fragments were directly sequenced, without cloning, using the BigDye system (Applied Biosystems). Sequences of the frequently used RAG1 gene were retrieved for the same taxa from GenBank for comparison to the newly developed markers [GenBank: AY430199, NM_131389, U15663, AB120889, DQ492511, AY308767, AF108420, EF033039EF033043]. When RAG1 sequences for the same taxa were not available, a taxon of the same family was used, i.e. Nimbochromis was used instead of Oreochromis and Neobythites was used instead of Brotula.

Phylogenetic analysis

Sequences of the 10 new markers in the 14 taxa were used in phylogenetic analysis to assess their performance. Sequences were aligned using ClustalX [56] on the translated protein sequences. Uncorrected genetic distances were calculated using PAUP [57]. Relative substitution rate for each markers were estimated using a Bayesian approach [58]. Relative composition variability (RCV) and treeness were calculated following Phillips and Penny [44]. Prottest [45] was used to chose the best model for protein sequence data and the AIC criteria to determine the scheme of data partitioning. Bayesian analysis implemented in MrBayes v3.1.1 and maximum likelihood analysis implemented in TreeFinder [59] were performed on the protein sequences. One million generation with 4 chains were run for Bayesian analysis and the trees sampled prior to reaching convergence were discarded (as burnin) before computing the consensus tree and posterior probabilities. Two independent runs were used to provide additional confirmation of convergence of posterior probability distribution. Given the biased base composition in the nucleotide data indicated by the RCV value (Table 2), we analyzed the nucleotide data under the RY-coding scheme (C and T = Y, A and G = R), partitioned by gene in TreeFinder, since RY-coded data are less sensitive to base compositional bias [44]. Alternative hypotheses were tested by one-tailed Shimodaira and Hasegawa (SH) test [53] with 1000 RELL bootstrap replicates implemented in TreeFinder.

rating: 0.00 from 0 votes | updated on: 2 Jul 2008 | views: 6896 |

Rate article: