Today, natural history holds some of the most intellectually challenging conundrums to ever fascinate the human mind. Further, natural history offers biological chemists the opportunity to place broad biological meaning on the detailed analysis of the structure reactivity of isolated biological molecules studied in a reductionist setting. To do so, however, natural history must be connected to the physical and molecular sciences, both in subject matter and in culture.
In part to make this connection, natural historians have sought to change the research paradigm in their field to favor quantitative data directed towards the "proof" of hypotheses over "story telling". Proving hypotheses is difficult in natural history (pace the philosophical reality that no significant statement in empirical science can ever be said to be "proven"). The events of interest (such as the extinction of dinosaurs) are frequently distant in time, or require a passing of time (as for speciation), making them difficult to reproduce in a laboratory. The scale of the concepts involved (species, environments, planets) also does not lend these concepts to laboratory models and laboratory-controlled tests. Further, a reductionist approach, even when available, will not necessarily generate data that are relevant to the big issue that concerns the natural historian. The emphasis on data and proof has ameliorated the worst excesses of storytelling in natural history, with enormous positive impact.
Just as natural historians were purifying their field of storytelling, however, whole genome sequences began to emerge. By dramatically increasing the quantity of chemical data concerning the molecular structures of proteins, genomics changed the limiting steps in biochemical and biomedical research. No longer was the typical researcher attempting to solve an organic chemical or biotechnological question (What is the sequence of my protein? How do I express it at high levels to get the sequence?) for a protein that had been selected for functional reasons. Today, the typical researcher knows the structure of many proteins, and wishes to select one for expression and study based on a hypothesis about its potential function.
Here, the fact that any definition of function, which must make reference to fitness, requires some systems, ecological, or planetary context, makes the natural historian a natural source of hypotheses. Their full reductionist armamentarium is available in the laboratory to test and explore any hypothesis that the natural historian might provide. The biomedical researchers may like some guidance from the natural historian to narrow the broad selection, or to shorten the random walk, if only slightly.
For this purpose, the forswearing by natural historians of storytelling has come at a most inopportune time. To the modern natural historian, creating hypothesis can easily be regarded as "storytelling". They are reluctant to do so, and may criticize as atavistic colleagues who do.
This has created a vacuum in the scientific community. Very few laboratories exist that can draw upon an expertise in natural history to generate stories that create hypotheses for the researcher working in experimental biochemistry and molecular biology.
This article is designed in part to illustrate how this vacuum might be filled. Here, we do not just tell a story based on natural history, or even a story based on natural history supplemented with physiology and molecular sequence data. Rather, we show how the addition of other data, including data from X-ray crystallography, can make a story sufficiently rich that it can be viewed as being internally consistent with a wide range of independent data drawn from independent sources. This creates a hypothesis that is more than a story, even if it is less than proven.
With aromatase, the congruence of our different analyses makes a compelling suggestion that the three aromatase paralogs in pigs arose by two duplication events in the late Eocene or early Oligocene. The emergence of the aromatase paralogs corresponded approximately in time to the emergence of larger litter size in suines. This implies that the two duplication events are functionally related to the larger litter sizes. This inference is consistent with the physiological impact of estrogen synthesis by these paralogs in Sus. Steroid production by the porcine embryo is tightly controlled by the transient expression of aromatase and 17-hydroxylase (P450C17) between days 10 and 13 [20,21,52]. In contrast, estrogen synthesis by the equine embryo begins as early as day 6 and increases with embryo age and diameter . The estrogen produced by the pig embryonic aromatase is believed to have an impact on the mobility, spacing, and implantation of the concepti [52-56]. Adequate spacing would appear to be required to manage a larger litter.
This is consistent with a structural biological analysis that correlates specific amino acid replacements with specific changes in the substrate and product specificity of the protein . Interestingly, the substrate specificity of human aromatase is reported to be more similar to that displayed by the pig placental enzyme than the ovarian form [48,49]. This is an unexpected similarity given that our evolutionary analysis suggests a change in biochemical function along the fetal/placental branch in the Suidae.
It should be noted that the hypothesis is supported by the combination of data that individually would not have strength past storytelling. Thus, the KA/KS ratio of 0.93 would not, by itself, compel any particular interpretation. Its implications are greater given the relatively low KA/KS ratios of other branches of the tree. But the addition of crystallographic information, itself not compelling, makes a combination that is more compelling.
Further, this hypothesis generation itself generates discoveries that might lead to their own hypotheses. An analysis of the evolutionary branches separating pigs and humans suggests an additional episode of adaptive change. The branch leading to the ancestor of human aromatase (branch 3) has a remarkably high KA/KS ratio (13 non-synonymous and no synonymous changes; Figure 5). This is a KA/KS ratio greater than unity, and does (pending evaluation of its statistical significance) compel the inference of an episode of adaptive change. Intriguingly, these changes are also clustered in the same regions of the structure as those changing along branch 1 leading to the stem fetal/placental enzyme, near the substrate and co-reductant binding sites. This implies that the substrate/product specificity of the ancestral aromatase protein was not like that of either the human or the pig placental forms, but rather reflects features that arose convergently in these two species .
Notably, four of the sites (positions 47, 153, 219, 269) that undergo replacement during the emergence of pig placental aromatase from the last common ancestor are the same as four that arose in the emergence of the human aromatase from its last common ancestor. Of these, the amino acid replacements are identical at two sites (Thr → Met at site 153; His → Arg at site 269). The probability associated with randomly observing this pattern is extremely low (0.000021) . An additional site is displaced by a single position in the sequence alignment (259/260). We hypothesize that these represent an example of adaptive parallel evolution.
It is important to point out that even an analysis this broad is likely to cover only a small part of a complicated reproductive endocrinology that must be associated with larger litter sizes. For example, the exact nature of the products produced by individual aromatases remains controversial, and may be different in laboratory studies depending on the conditions where they are studied [50,60-62]. This is especially the case with the 19-nortestosterone derivatives in Figure 1.
Further, an elegant recent study by Corbin et al.  identified 1β-hydroxytestosterone as a novel product produced by recombinant pig ovarian aromatase that is absent from the products produced by the porcine placental paralog, or by either human or bovine aromatase. This testosterone derivative binds to an androgen receptor, consistent with physiological activity. This was unknown before just this year, suggesting that more endocrine novelties remain to be discovered. Any of these may be relevant to a test of this system. For example, these hypotheses make predictions about the product specificities of the two peccary aromatases reported here.
In fact, some data suggest that uterine exposure to androgens severely decreases litter size and embryonic survival during the time of maternal recognition of pregnancy . This is consistent with the hypothesis of Corbin et al.  that the evolution of the placental paralog is associated with increased efficiency of testosterone aromatization. This is also consistent with the current data, and the argument presented here.
It goes without saying that still more factors might be associated with an increase in litter size from one-two (presumed in Diacodexis, see Figure 4) to 12 or more in domestic swine. Most trivially, this increase might be associated with an increase in ovulation rate, and/or an adjustment in the structures and binding specificities of estrogen receptors .
Nevertheless, the first aromatase duplication, shared by pigs and peccaries, appears to have happened in the late Eocene (recognizing the error associated with these dates), around 35 Ma (Figure 4). This was a time of great global change, with dramatic cooling in the higher latitudes. More archaic kinds of mammals (e.g., some earlier families of perissodactyls and artiodactyls) became extinct, while many modern families (including the Suidae and Tayassuidae) became established at this time . Suoids differed from other contemporaneous ungulates in their commitment to omnivory, even though a few forms, such as the modern warthog Phacochoerus aethiopicus, are more specialized herbivores. Perhaps the ability to bear a slightly larger litter than other artiodactyls was advantageous to them in this time of global ecological transition. However, it should be noted that larger litters usually mean altricial (i.e., relatively underdeveloped) young, a reproductive strategy apparently not available to larger, cursorial (running-adapted) ungulates, which give birth to precocial (i.e., well developed) young that are fully locomotory at birth .
The second aromatase duplication, with the ensuing capacity to produce multiple young, probably occurred within the family Suidae, some time during the Oligocene. The molecular data suggest dates of divergence between porcine fetal and placental aromatases as between 27–38 Ma, and the earliest known suid is of early Oligocene age , around 33 Ma (Figure 4). Large litters may have characterized the entire suid family. While the extant subfamily Suinae is primarily a Plio-Pleistocene radiation, during the Oligocene to Pliocene suids were exceedingly diverse taxonomically (with six other subfamilies known) as well as individually abundant as fossils [32,33,67]. In contrast, the predominantly North American tayassuids were never as diverse. It is possible that this tremendous Old-World diversity of suids, which continues to this day, is related to their capacity for the production of large litters.
This type of speculation opens questions. For example, the babirusa (an Indonesian pig) is reported to have average litters of one-two individuals [68,69]. While it is possible that litters contain three-four individuals, the occurrence is low . If the common ancestor of babirusa with the African/Eurasian Suinae had a larger litter, then the babirusa must be hypothesized to represent a reversion to the more primitive condition. At present, however, relatively little is known of either the molecular biology or the natural history of babirusa. The date of divergence from modern swine is placed between 12–26 million years [71,72], while our TREx analysis using cytochrome b places this data at ca. 18 Ma (data not shown). Clearly, further study is warranted.