The size of functional non-redundant genome can be quantified using the Shannon's information measure [8],
but currently this approach is not feasible because it would require
unavailable information on fitness weight of each nucleotide in the
genomic DNA. A simpler method is to consider functional genome as the
size of coding and regulatory regions as they are known today.
The size of the functional and non-redundant fraction of genomes gradually increased in evolutionary time (Fig. 1). Mammals (mouse, rat, and human), which appeared just recently in earth history, have a genome of ca. 3.2 × 109 bp, however only 5% of it is conserved between species [13].
Conserved regions are definitely functional but there may be additional
functional regulatory regions that are species-specific. These regions
can be identified based on the absence of transposons, because
transposons that are inserted in functional regions would interfere
with normal gene regulation and eventually disappear due to natural
selection [14]. Transposon-free regions of 5 and 10 kb account for 20%, and 12% genome size, respectively [14]. If we take 15% as a rough estimate, then the size of functional and non-redundant genome in mammals is ca. 4.8 × 108 bp. Fish existed 0.5 billion years ago [15]. The genome size of the fugu fish is 4 × 108 bp and 1/3 of it is occupied by gene loci [16]. Worms existed at least for 1 billion years [17]. The genome of the worm Caenorhabditis elegans has size of 9.7 × 107 bp and ca. 75% of its length is functional [18]. Eukaryote cells with mitochondria appeared between 2.3 and 1.8 billion years ago [19], and prokaryotes existed on earth as early as 3.5 billion years ago [20]. The date of eukaryote origin was estimated rather precisely (± 250 Mya) based on the homology of protein sequences [17]. Although there is abundant information on the size of genomes in contemporary prokaryotes and unicellular eukaryotes [21],
most of it is not suitable for assessing the genome size of their early
ancestors because the majority of these organisms had already increased
their genome size since the origin of first prokaryotes and eukaryotes.
Thus we were interested in the most primitive representatives of these
groups. The smallest eukaryote genome (2.9 × 106 bp) was found in the microsporidia Encephalitozoon cunicul [22], and the smallest prokaryote genome size (5 × 105 bp) was found in Nanoarchaeum equitans [23] and Mycoplasma genitalium [24].
Prokaryotes and eukaryotes with the smallest genome are parasitic and
may have a reduced genome size due to parasitism. However I selected
them to get the most conservative estimate for the time elapsed since
the origin of life. Also it is possible that the first prokaryotes and
eukaryotes indeed had genome size comparable to contemporary parasitic
species. Comparison of protein sequences indicates that the divergence
time of archaebacteria, eubacteria, and eukaryotes occurred from 3.1 to
3.8 billion years ago [25].
I have not included plants into this graph for the following two
reasons. First, their genomes are often highly redundant due to
polyploidy [4],
which makes it difficult to estimate the size of the functional
non-redundant fraction. Second, functional non-redundant genomes in
plants did not increase as fast as in vertebrates, and our goal was to
trace the genomes in best performing groups of organisms. For example,
the functional genome in Arabidopsis thaliana is ca. 3 times smaller and has more redundancy than in mammals [26], but flowering plants appeared simultaneously with mammals ca. 125 million years ago [27].
The increase of genome size approximately follows an exponential pattern (linear in the log scale) (Fig. 1).
Because our estimates of genome size of first prokaryotic and
eukaryotic cells are based on extrapolation rather than direct
measurement, this regression cannot be viewed as a proof of the
exponential hypothesis. We can only say that regression is consistent
with this model and that the functional fraction of the genome
increased approximately 7.8-fold per 1 billion years. Because two
earliest points on the graph are most uncertain, we did a sensitivity
analysis by varying these points within the limits of uncertainty (±
300 Mya, and ± 0.3 log bp). Then the rate of increase of functional
genome changed in the range from 4.6 to 15.3 fold per 1 billion years.
The strong version of the exponential hypothesis is that the rate of
genome increase can be extrapolated to the early (pre-prokaryotic)
evolution of life. If this hypothesis is true, then the origin of life
should be dated ca. 10 billion years ago, i.e. before the formation of
earth and solar system, and implies panspermia (i.e., inter-stellar
passive transport of living bacterial spores). Considering our
sensitivity analysis, the date of life origin may vary from 7 to 13
billion years which is still greater than the age of earth.