One strategy useful for understanding the function of genes correlates events in their molecular evolution with events occurring in the history of other genes in the same and/or neighboring lineages, and with events recorded in the geological and paleontological records . We incorporated a tool to date the divergence of two or more genes through an analysis of transitions at synonymous sites of two-fold redundant coding systems, where the encoded amino acid has been conserved . This analysis exploits the approach-to-equilibrium kinetic behavior displayed by these sites. The analysis yields a transition redundant exchange (TREx) distance for any gene pair where the synonymous sites have not equilibrated.
To calibrate the silent TREx clock, inter-taxa histograms relating pig (Sus scrofa) and ox (Bos taurus) were constructed for transitions at the silent sites of two-fold redundant codon systems where the encoded amino acid was conserved between the species . The major peaks associated with the separation of these two lineages was observed at f2 = 0.87, corresponding to a TREx distance of kt = 0.332. As the fossil record constrains the date of divergence of these two lineages to be 60 ± 5 Ma [25-27], and the codon biases in modern Sus scrofa and Bos taurus project an equilibrium value for f2 = 0.54 , the rate constants for transitions at the TREx silent sites were estimated to be ca. 2.8 × 10-9 transitions/silent site/year during the time interval that separates these lineages.
Analogous f2 values were then obtained for other vertebrate aromatase pairs, including fish vs. tetrapods (f2 = 0.56), birds versus mammals (f2 = 0.612), primates versus ungulates (f2 = 0.823), and horses versus artiodactyls (f2 = 0.828). Assuming a time-invariant single lineage first order rate constant of 3.6 × 10-9 changes/site/year and an equilibrium f2 of 0.54, the corresponding dates of divergence are calculated to be 435, 258, 67, and 65 Ma respectively, with the oldest dates being the least precise. The last three of these dates of divergence are similar to those suggested by the paleontological record , within the error of the calculation, which reflects the modest number of characters used to calculate the f2 values. A tree for the artiodactyl lineage was constructed from the corresponding TREx distances (Figure 2). This was found to be consistent with the tree constructed from other metrics.
The TREx clock is not widely used. It may, however, provide more accurate dates in regions where synonymous transitions have not equilibrated than conventional clocks that combine data from synonymous transitions and synonymous transversions, or from non-synonymous changes. A comparison of different clocks will be provided in detail elsewhere (Benner et al., in preparation). Briefly, the rate constants for transitions and transversions are more different than the two rate constants for purine-purine and pyrimidine-pyrimidine transitions. Further, nucleotide frequencies can be used to calibrate the end equilibrium points for two-fold redundant codon systems directly, and this permits an "approach to equilibrium" formalism, well known in chemical kinetics, to be applied [24,29-31].
From the tree, the TREx distances from the ancestor of fetal and placental aromatase to the modern enzymes are 0.113-0.079 (using an endpoint of 0.54 to reflect equilibration at the silent sites), corresponding to a range in the time of divergence of 26–38 Ma. The TREx distances from the divergence of all of the porcine aromatases and the modern forms ranges from 0.082–0.116, corresponding to dates of divergence in the range of 27–39 Ma. This suggests that the three aromatase paralogs diverged in the late Eocene to mid Oligocene.
To further correlate the duplication of the genes with the fossil record, genomic DNA was analyzed from relatives of Sus scrofa. Both peccary and babirusa seminal plasma (Tayassu pecari, from the Center for Reproduction of Endangered Species, Zoological Society of San Diego; Babyrousa babyrussa, from the Bronx Zoo, New York) was probed by PCR (Polymerase Chain Reaction) amplification using exon 4-specific primers . Bands having the sizes expected for the corresponding aromatases were observed by agarose gel electrophoresis. Based on sequence similarity, two isoforms of aromatase were obtained from both peccary and babirusa as clones derived from the PCR products (Figure 3). This establishes that at least one of the duplications occurred before the Tayassuidae (the peccaries) diverged from the Suidae (the true pigs) ca. 35 Ma [33,34].
These data are consistent with an evolutionary model that holds that the ancestor of pig and oxen (approximated in the fossil record by Diacodexis, from the early Eocene ca. 55 Ma)  contained a single aromatase gene, and that the paralogous genes in pig arose some 20 million years later. This suggests that the paralogs in pig can be explained neither in terms of the fundamentals of vertebrate reproductive endocrinology, nor as a consequence of swine domestication.
This does, however, suggest that the emergence of the aromatase paralogs was approximately contemporaneous with the emergence of a litter in the Suoidea larger than that found in the ancestral artiodactyl condition. While ruminant and camelid artiodactyls have only one-two young per litter, suoids in general have at least two young per litter (as seen in peccaries) and most suines (true pigs) routinely have three-four young (up to 12 in the domestic pig, Sus). Note that there has long been the tacit assumption that large litters in suoids represent the primitive artiodactyl condition. Large litters are primitive for mammals in general, and because suoids are plesiomorphic in some anatomical conditions relative to other artiodactyls (e.g., short legs, retention of four digits, bunodont cheek teeth), they have been assumed to be plesiomorphic in other respects.
Other data suggest that small litters are in fact the primitive artiodactyl condition. Tragulids (mouse deer or chevrotains) are surviving small, primitive ruminants that are not too dissimilar from Diacodexis in body form, but only have one-two young per litter. Additionally, fossil record data on pregnant oreodonts (an extinct group probably related to the ruminant/camelid artiodactyl lineage, but with a suoid-like plesiomorphic postcranial morphology) shows that they also only had one-two young [36,37]. A cladogram of the Artiodactyla (Figure 4) illustrates the probable acquisition of multiparous versus uniparous reproductive strategies, and places the character of litters with typically more than two members emerging just before the divergence of Tayassuidae and Suidae.
The approximate correlation in time of the aromatase divergence in Suoidea with the enlargement of litters in Suoidea suggests, as a hypothesis, that the two are functionally related. To expand on this hypothesis, we sought genomic signatures of functional change within the aromatase paralogs. The number of non-synonymous changes in the gene divided by the number of the synonymous changes, normalized for the number of non-synonymous and synonymous sites (the KA/KS value), strongly suggests functional change when the value is significantly greater than unity [38,39], and is also an indicator of hypothetical functional change when the value is high on a branch of a tree relative to other branches of the same tree [40-43]. KA/KS values were reconstructed for individual branches of the evolutionary tree derived from the Darwin bioinformatics workbench (see Methods) using a distance matrix and ancestral states constructed by the method of Messier and Stewart . The typical branch in the aromatase evolutionary tree has a KA/KS value of 0.35. A higher KA/KS value of 0.85 is found in the episodes of evolution near when the pig aromatases diverged. While a KA/KS value of 0.85 does not require the conclusion that positive selection occurred during the emergence of these aromatase paralogs, an inference based on the magnitude of KA/KS in one branch, relative to the KA/KS value for typical branches [40-43], suggests that adaptive changes occurred during the duplications of the aromatase genes in pigs.
A complete maximum likelihood analysis of the aromatase gene family was performed using the PAUP and PAML programs. The resulting tree, generated in PAUP, is shown in Figure 5, with parameters estimated using PAML. Once more, the generation of paralogs in the pig was found to have occurred after the divergence of pigs from oxen. A high KA/KS value (0.93) was again found in the divergence of the swine isoforms on the branch leading to the ancestor of the placental and embryonic enzymes following their divergence from the pig ovarian enzyme. The distribution of substitutions along this branch is consistent with altered functional constraints for the placental and embryonic enzymes compared with their extinct and extant counterparts (Tables 1 and 2) .
We correlated the episode of rapid sequence change during the emergence of the embryonic and placental paralogs with the structural biology of aromatase. A homology model of aromatase was built from progesterone 21-hydroxylase from rabbit liver (coordinates from PDB file 1DT6) , a homologous cytochrome P450-dependent monooxygenase. Residues undergoing replacement during the episodes represented by branches in Figure 5 (branches 1–3) are highlighted in color on the 3D model using a program in prototype with HyperChem (Figure 6).
Multiple features within the pattern of amino acid replacement were apparent. First, the sites accepting amino acid replacements in the branches with low KA/KS values (as represented by branch 2 in Figure 5) were typically scattered without any obvious pattern over the surface of the protein. This is expected for neutral drift, although an adaptive role for these replacements is not excluded by this analysis.
In contrast, the distribution of sites accepting amino acid replacements during the episode of rapid sequence evolution of branch 1 (as indicated by a relatively high KA/KS value) involving pig paralogs was not random over the protein surface. Rather, the sites are clustered near the substrate binding pocket, and in a region of the surface believed to contact the co-reductant protein, as identified by mutagenesis experiments in the homolog [46,47].
The clustering of amino acid replacements near a substrate binding site during an episode of rapid sequence evolution suggests that the substrate specificity of the protein might be changing in correlation with a change in the detailed physiological role of the protein. Recent reports suggest that the substrate and product specificities of the placental and embryonic enzymes are indeed different from those of the ovarian enzyme [23,48-50]. Further, synthesis of estrogen by the ovarian enzyme is more dependent on the structure of the co-reductant than is the placental enzyme . Our in silico analyses rationalize these experimental observations from a structural perspective. The coupling of an evolutionary analysis to a crystallographic analysis suggests that the amino acid changes are functionally significant.