The emergence of complete genomes for many organisms, including humans, has created the need for hypotheses concerning the "function" of specific genes that encode specific proteins. While "function" is interpreted by different workers in different ways , Darwinian theory (by axiom) requires that the term be connected to fitness; natural selection is the only mechanism admitted by theory to generate functional behavior in a living system, macro or molecular. This, in turn, implies that the hypotheses about function have a "systems" component, including the interaction of the protein with other proteins, their impact on the physiology (defined broadly) of the cell and organism, and the consequences of physiology in a changing ecosystem in a planetary context .
Systems hypotheses can be supported by information from many areas. Geology, paleontology, and genomics, for example, provide three records that capture the natural history of past life on Earth. At the same time, structural biology, genetics, and organic chemistry describe the structures, behaviors and reactivities of proteins that allow them to support present life. It has been appreciated that a combination of these six types of analysis provides insights into functional behavior of proteins that cannot be provided by any of these alone . Over the long term, we expect that the histories of the geosphere, the biosphere, and the genosphere will converge to give a coherent picture showing the relationship between life and the planet that supports it. This picture will be based, however, on individual cases that serve as paradigms for making the connection.
The aromatase family of proteins offers an interesting system to illustrate the power of this combination as a way to create hypotheses regarding protein function within a system . These hypotheses are not "proof", of course, but are limiting in genomics-inspired biological experimentation, now that genomic data themselves are so abundant.
Aromatases are cytochrome P450-dependent enzymes that use dioxygen to catalyze a multistep transformation of an androgenic steroid (such as testosterone) to an estrogenic steroid (such as estradiol) (Figure 1). The protein plays a key role in normal vertebrate reproductive biology–a role that appears to have arisen before fish and tetrapods (land vertebrates, including mammals) diverged some 375 million years ago . Aromatase is important in modern medicine as well, especially in breast and other hormone-dependent cancers .
Different numbers of aromatase genes are found in different vertebrates. Two aromatase genes are known in teleost fish [6,7]. Only a single gene is known in the horse , rat , and mouse . Cattle have both a functional gene and a pseudogene built from homologs of exons 2, 3, 5, 8, and 9 of their functional gene; these are interspersed with a bovine repeat element [11,12]. In several mammalian species, including humans and rabbits, a single gene yields multiple forms of the mRNA for aromatase in different tissues via alternative splicing [13-16].
A still different phenomenology is observed in the pig (Sus scrofa). Three different mRNA molecules had been reported in different tissues from pig [17-21]. Compelling evidence then emerged that the three variants of mRNA identified in cDNA studies arose from three paralogous genes , rather than from a single gene differentially spliced . This implies that the three aromatase paralogs in pigs arose via gene duplications relatively recent in geologic time.
Hypotheses relating to the function of the three aromatase paralogs depend in part on when those duplications took place. If they were very recent, the three genes might have helped pigs adapt to domestication. If they pre-dated the divergence of pigs and fish , they may have different roles that are very fundamental to reproductive endocrinology in vertebrates. We apply here a series of tools to generate better hypotheses concerning the aromatase family of paralogs in swine.