Recent studies show there are a surprisingly large number of duplicated genes present in the genomes of all organisms that cannot be accounted for by the classic models of nonfunctionalization and neofunctionalization. The presence of large numbers of duplicated genes within the genomes of teleost fish, now widely presumed to have undergone a whole genome duplication event around 300-350 million years ago, provide an excellent opportunity for comparative studies to test the DDC model. Prior to the availability of large-scale genomic sequences, the ability to study regulatory subfunctionalization through identifying the regulatory elements responsible was limited due to a lack of appropriate identification strategies. The discovery of thousands of CNEs conserved across the vertebrate lineage, highly enriched for sequences likely to be distal cis-regulatory modules, allowed us to develop a strategy to begin to uncover this. We identified potential gene candidates that contain both CNEs in their vicinity and are likely to derive from fish-specific duplication events using data from the initial whole genome comparison of the Fugu and human genomes. CNEs that cluster in the same location in human but derive from two separate locations in the Fugu genome strongly indicate the presence of co-orthologous regions. We selected seven clusters of CNEs in the human genome, each in the vicinity of a single trans-dev gene that fulfilled these criteria. For each of these genes, we recreated a phylogeny using protein sequences identified in each Fugu region, confirming the genes are both orthologs (co-orthologs) of the single mammalian copy, and all topologies confirmed the genes derive from a duplication event following the split between ray-finned and lobe-finned fishes. This relationship was further confirmed by comparison of the genic environment around the trans-dev gene in both Fugu regions to that of the single region in human. Conserved gene order extends, in many cases, to one or more genes upstream and/or downstream of each co-ortholog, indicating a shared ancestral origin, although in several instances the neighboring genes have been partitioned between the co-orthologous regions, undergone rearrangement or have been lost.
The process of subfunctionalization, as described in the DDC model, is defined as the fixation of complementary loss of subfunctions that result in the joint preservation of duplicate loci [7]. For regulatory subfunctionalization, a subfunction may represent expression of a gene in a specific tissue, cell lineage or temporal stage. For genes with complex regulation these subfunctions are controlled by one or a combination of cis-regulatory modules. A proportion of these can be predicted through the comparative genomic approaches outlined in this study and for the purposes of this discussion are assumed to be represented by CNEs. Under the DDC model, subfunctionalization is thought to occur by two different routes: qualitative and quantitative [7]. Under qualitative subfunctionalization one duplicate copy undergoes one or more complete loss-of-subfunction mutations with the second copy subsequently acquiring null-mutations for a different set of subfunctions. Thus, each copy is required to recapitulate the full set of ancestral subfunctions. Under this route, a conserved element representing an independent cis-regulatory module that undergoes a null-mutation in one of the gene copies will no longer be under selective constraint, and will be 'lost' (that is, will not be detectable by sequence conservation) through the accumulation of degenerative mutations over evolution. This process should, therefore, be evident by the effective partitioning of conserved elements between co-orthologs. In contrast, quantitative subfunctionalization is more subtle and results from the fixation of reduction-of-expression mutations in both duplicates [7]. Here, both regulatory modules must be maintained in the genome once the summed activity for a particular subfunction in both copies has been reduced to the original level in the single ancestral gene.
By comparing each Fugu co-ortholog with its single orthologous region in mammals, we attempted to identify those 'ancestral' cis-regulatory modules present in the mammalian copy that are retained in only one of the Fugu copies and those that have to some extent been retained in both copies. This approach is particularly appropriate for early developmental regulators for which function and regulation are likely to be highly constrained across all vertebrates and the mammalian gene represents as close as possible the ancestral pre-duplication state. The probability of the preservation of gene duplicates through subfunctionalization is also assumed to be higher in genes with complex regulation that contain large numbers of independently mutable CRMs [60], a view reinforced by the overrepresentation of genes involved in development and cellular differentiation found in fish-specific gene duplicates [38]. As with trans-dev genes across the genome, overall numbers of CNEs between gene pairs differs substantially and is likely to reflect differences in regulatory complexity or the extent to which regulation in these genes has been conserved across vertebrate evolution. All seven pairs of co-orthologs contained a number of CNEs that have been partitioned into each co-ortholog along the lines of the qualitative subfunctionalization model. The extent to which CNEs have been partitioned between co-orthologs, however, varies widely. For some co-ortholog pairs, such as bcl11a and ebf1, the majority of CNEs appear to be completely partitioned between the two genes, whilst for other pairs, such as pax2, only a relatively small proportion is. In the DDC model, following initial subfunctionalization, the process of null-mutation fixation in persisting redundant subfunctions is thought to be random, leaving a roughly equal number of subfunctions in each gene copy. Although this appears to be true for the fign, pax2 and unc4.1 co-orthologs, partitioning in bcl11a.1/.2 and ebf1.1/.2 is highly asymmetrical. In both of these cases there is relatively little overlap between the complement of CNEs associated with each co-ortholog pair. It is possible, therefore, that the loss of some CNEs in one co-ortholog may have consequences for further loss of elements in that gene. CRMs may not all be functionally autonomous and may interact together to actuate their regulatory role [61]. The degeneration of one or more integral CRMs from a co-ortholog could accelerate further degeneration of other CRMs that are functionally dependant on them. Under this scenario, a gene duplicate may undergo substantial loss of elements, possibly influencing further asymmetrical loss.
In addition to CNEs that have undergone full partitioning between co-orthologs, some pairs have also retained a number of overlapping CNEs that have been preserved to some extent in both copies. For co-orthologs such as pax2.1/pax2.2, this type of CNE constitutes the majority of CNEs located around these genes, and is a common feature in most of the other co-ortholog pairs. Overlapping CNEs appear, in general, to be longer than distinct CNEs, although at this stage the relationship (if any) between element size/conservation and its functional importance/regulatory complexity is still unknown. While some of these elements have remained virtually identical in length since duplication, others have undergone major changes both at the edges and at the core of one of the elements, suggesting information loss in one or both copies. A significant proportion of overlapping CNEs have also undergone asymmetrical rates of substitution, suggesting one copy retained the ancestral function while the other was free to evolve, possibly to a novel function.
What explanations could account for this high level of CNE retention observed between co-orthologs? The first is the possibility that some CNEs have undergone quantitative subfunctionalization. Here, degenerative mutations in both CRMs lead to a partial loss of subfunction in each element (such as a reduction in the level of expression in a specific tissue) rather than complete loss. Therefore, both elements must be maintained in the genome once the summed activity for a particular subfunction in both copies has been reduced to the original level of the ancestral gene [7,60]. Alternatively, some of these elements may function as silencers or insulators or play roles in chromatin remodeling, ensuring correct regulatory compartmentalization and control of both gene duplicates. Another explanation could be the possible inter-relations of cis-regulatory modules. As previously mentioned, there is a possibility that some CRMs are interrelated and act in concert to perform their function. It is possible, therefore, that the loss of a CRM critical for the function of other CRMs could lead to large-scale loss in one gene copy. For example, the partitioning of two CRMs that are functionally independent but both dependent on another CRM for correct function would lead to the retention of that critical CRM in both gene copies. Finally, it is possible that although both elements have retained general sequence identity, small nucleotide changes between the elements (such as those seen in asymmetrically evolving CNEs) may have substantial consequences for element function. Indeed in a recent pioneering study, Tümpel et al. [62] pinpointed subtle sequence changes within well-defined enhancers responsible for divergent expression of hoxa2 co-orthologs (hoxa2(a) and hoxa2(b)) in Fugu. Sequences of the enhancers responsible for full expression of HOXA2 in mice and chicken within the hindbrain were found to be generally conserved in both Fugu copies. Nevertheless, it was demonstrated that a small number of base-pair differences between hoxa2(a) and hoxa2(b) enhancers within several known transcription factor binding sites was sufficient to erase enhancer activity and was shown to be responsible for the lack of expression of hoxa2(a) within certain expression domains. However, the authors could not explain why the non-functional enhancers in hoxa2(a) had remained partially conserved, although they postulated they may have regulatory roles in expression domains not covered by the survey. This study highlights the power of correlating known expression differences between co-orthologs with comparative sequence analysis, especially with previous knowledge of the binding sites involved. It also highlights, as functional assays on more ancient duplicated CNEs have demonstrated [29], that sequence similarity may not always extend to functional similarity. Indeed, it is equally plausible that some of the sequence evolution seen between some overlapping CNEs is indicative of neofunctionalization. It would be of great interest for future studies to correlate any novel expression of teleost co-orthologs compared to other vertebrate homologs with changes in these elements. Finally, CNEs may represent several independent or overlapping CRMs and the loss of sequences within the CNE may be due to loss of just one of the CRMs, constituting a form of qualitative subfunctionalization
The pattern of CNE retention and evolution in these Fugu co-orthologs is certainly consistent with both mechanisms of subfunctionalization inherent in the DDC model. However, the extent of the contribution of each mechanism to subfunctionalization is different for each gene pair. This could be a consequence of each co-ortholog pair having followed a different evolutionary path after duplication; each under a number of different selective pressures depending on their expression and/or function as well as the influence of stochastic evolutionary events following a relaxation of evolutionary constraint due to genetic redundancy. It is clear, therefore, that confirmation of regulatory subfunctionalization in these gene pairs will require both the characterization of expression patterns for both co-orthologs (through approaches such as in situ hybridization) and confirmation of the regulatory potential of their surrounding CNEs through rapid in vivo reporter-gene assays. Currently, due to the limitations of Fugu as an experimental model, none of the expression profiles for the genes in this study have been characterized, which could be used to assess the extent and type of regulatory change these gene duplicates have undergone. In the more commonly used zebrafish experimental model organism gene expression profiles of two gene-pairs from this study, pax2 and sox1, have been characterized. Expression patterns of PAX2 co-orthologs of PAX2 (Pax2a and Pax2b) [63] are highly similar, although absence of Pax2b expression in the developing kidney as well as differences in temporal expression confirms they have undergone a level of regulatory differentiation. This appears to corroborate the pattern of element retention/partitioning seen in Fugu pax2 co-orthologs, where the majority of CNEs are largely conserved in both copies with a smaller number of elements partitioned between each gene. This suggests that, similar to their zebrafish homologs, they may have largely overlapping expression domains and have undergone a more subtle form of quantitative subfunctionalization through changes in their temporal expression. A recent survey of expression of the SOX B family of genes identified a level of regulatory differentiation between the zebrafish sox1a and sox1b co-orthologs [64]. The main differences in expression are temporal (for example, sox1a expressed in the lens a number of hours before sox1b), although there are also spatial differences with sox1a expression initiated in hindbrain and forebrain whereas sox1b initiates only in the forebrain. The overall expression patterns of these co-orthologs correspond closely to SOX1 expression in other vertebrates, indicating that changes in their expression are due to subfunctionalization rather than neofunctionalization. Our study reveals that at least half of all CNEs identified around sox1 co-orthologs have been partitioned, indicative of a level of subfunctionalization, while only one of the overlapping CNEs has undergone a significant level of substitution; a possible reflection on the lack of neofunctionalization in these genes. As patterns of subfunctionalization are known to occur differently between fish species, it remains for further studies of the pax2 and sox1 co-orthologs in Fugu to discover whether expression differences are similar to those observed in zebrafish.
The majority of regions in this study, in addition to containing CNEs derived from a teleost-specific duplication have counterparts elsewhere in the genome that derive from more ancient vertebrate-specific duplications, reflecting the complex duplication histories of their associated genes. The fact that most of these sequences are retained not only between co-orthologs (for example, bcl11a.1 and bcl11a.2) but also between out-paralogs (for example, BCL11A and BCL11B) spanning over half a billion years of evolution is an indicator of the potentially critical nature of these sequences to the regulation of these genes. In addition, this dataset provides many candidates for further functional studies on the evolution of these sequences and the implications of these changes on their neofunctionalized paralogs and subfunctionalized co-ortholog targets.
The role the teleost-specific genome duplication has played in the evolution of this lineage remains unclear. It is now generally accepted that the genome duplication event(s) that occurred at the origin of vertebrates played a major role in species diversity and, in particular, the huge increase in vertebrate morphological complexity [65,66]. In contrast, the more recent teleost specific genome duplication does not appear to have had the same effect, with arguably less complexity in the teleost anterior-posterior axis than in tetrapods [5]. Speciation in teleosts though is unmatched among descendants of other vertebrate lineages, with over 22,000 known species, making up over half of all extant vertebrates species [67]. This has led to suggestions that the genome duplication event may be directly responsible [14,68]. Indeed, evidence presented in a review by Taylor et al. [13] indicates that ployploidized members of the Salmon family and Catostomidae (sucker fish) exhibit higher degrees of speciation than members of the same family that remain diploid. Subfunctionalization has been proposed as a likely mechanism for this increased rate of speciation since differential resolution of subfunctions in multiple gene pairs would lead to reproductive incompatibility due to a reduction in hybrid fitness [69]. Evidence for such lineage-specific subfunction partitioning has been demonstrated for a small number of genes (for example, divergent expression of sox9 in stickleback compared to zebrafish [18]), but large-scale studies will be required to resolve the degree of subfunctionalization that took place before and after divergence within the teleost lineage. Furthermore, if lineage speciation is driven by differential subfunctionalization, we might expect the pattern of CRM evolution and partition/retention for the Fugu genes discussed here to be different to those in other fish species. The recent release of a number of divergent draft teleost genomes, including those of zebrafish, medaka and stickleback, should allow further studies in this direction. Furthermore, the approach and analysis used in this study can be extended for use in any situation where genomic regions surrounding duplicated genes can be compared to an orthologous region that has remained single copy. This may be particularly useful for inter-teleost comparisons, where co-ortholog genes have been differentially retained since the whole-genome duplication prior to the teleost radiation.