Reviewer's report 1
Eugene V. Koonin, National Center for Biotechnology Information, NIH, Bethesda Maryland, USA
What
determines the total size of genomes and their effective complexity
(sensu Adami) and how did genome size evolve throughout life's
evolution are genuinely exciting and fundamental biological issues.
Potentially, a lot of information can be extracted from comparative
analysis of genome size and complexity. This paper is an attempt to
cast this analysis in the simplest possible terms, i.e., to
back-extrapolate the maximum genome size attained on earth at different
times (I believe this is what is being used to produce the plot in Fig.
1;
the corresponding language in the paper is not very precise) to the
origin of the first organisms. The inferred dates for the origin of
life are very early and, under a straightforward interpretation favored
by the author, suggest that life did not begin on earth but rather
elsewhere in the Universe some 10 billion years ago, after which it
spread by panspermia.
I am not at all a priori prejudiced against the panspermia
hypothesis and actually agree with the author's concluding sentence in
that panspermia should be considered "on equal basis with alternative
hypotheses of de-novo life origin on earth". However, I think that the
approach used in this work provides no support for an early date of
life's origin. The main problem, as I see it, lies with the fact that
the key plot in Fig. 1
combines two worlds with very different evolutionary trends, the
prokaryotes and the eukaryotes (especially, complex, multicellular
eukaryotes). The exponential law very well might hold for the portion
of the curve that corresponds to complex eukaryotes (or, possibly,
eukaryotes in general), and the reasons why this is so would be
interesting to discuss in some depth (more data points would be
required, though). The problem is, however, that, for the first 1.5–2
billion years of life's evolution on this planet, all existing life
forms were prokaryotes. There is just one point corresponding to
prokaryotes in Fig. 1,
and there is, indeed, an excellent reason for that: we have no evidence
whatsoever that the maximum genome size of prokaryotes increased during
that enormous time span or in the time elapsed since.
Author's reply (1)
I have addressed this problem in
discussion by estimating the average rate of increase in genome
complexity in Archaea and Eubacteria which appear lower than the rate
of complexity increase in eukaryotes. Then I discuss 2 possible
scenarios: (a) initial rates of complexity increase in prokaryotes were
similar to those observed in eukaryotes and then slowed down due to
organization constraints, or (b) rates of complexity increase in
prokaryotes were always slower than in eukaryotes. With scenario (a),
the expected origin of life is ca. 10 billion years ago according to
regression (Fig. 1),
and with scenario (b), life originated even earlier than that. Thus,
separate handling of prokaryotes and eukaryotes does not bring the
predicted date of life origin closer to present.
For all we know, the characteristic complexity of the prokaryotic
genomes had been reached very early on during life's evolution
(considering the geochemical and paleontological evidence of more or
less modern-like microbiota ~3.5 billion years ago) and remained in
equilibrium ever since. Thus, to the best of our understanding, there
was an early explosive phase of evolution of complexity, which was
followed by stasis (the prokaryotic phase of life's history) and then
by another burst associated with eukaryogenesis. The authors dismisses,
very lightly, the notion of punctuated equilibrium. This is not the
place to assess the validity of the specific theory of Gould and
Eldredge (it might indeed have its problems), however, I believe that,
in general, major non-uniformity of the tempo of life's evolution
cannot be denied.
Author's reply (2)
If the rate of evolution is
measured by numerical expansion of some taxonomic groups and numerical
decline of other groups, then it is definitely non-uniform. However, in
the paper I discuss the rate of increase in genome complexity which is
an entirely different process. So far there is no evidence that the
rate of complexity increase fluctuated considerably over time. In
particular, there is no evidence of "early explosive phase of
evolution" of prokaryotes and "another burst associated with
eukaryogenesis". Genome complexity can increase even if direct
adaptations to the environment remain stable (due to increasing
reliability, modularity, and adaptability).
In the general epistemological sense, the approach to
back-extrapolation of life's history taken in this paper can be
characterized as ultra-uniformitarianism, a wordlview championed by the
great geologists Hutton and Lyell and strongly embraced by Darwin (this
work even might be considered something of an extension of this view
but the spirit is definitely the same). In that vein, I believe that
what is done here is an interesting exercise because it showcases the
kind of conclusions to which ultra-uniformitarianism can lead. If the
entire discussion and conclusions were rewritten along these lines,
this could turn into a sound piece.
There are two issues in this paper that are not as germane to its
main conclusions as the above but are important and deserve comment
because they are not, I believe, adequately addressed. The first issues
is the nature on constraints that effect evolution of genome
complexity/size. The authors dismisses Lynch and Conery's
population-genetic concept of genome complexity evolution (his ref. [12]) by citing the comment of Charlesworth and Barton [13].
This is, I think, disingenious because Charlesworth and Barton's note
(regardless of whether or not their arguments are compelling) does not
even seek to invalidate Lynch's theory as a whole but rather addresses
specific issues of mobile element propagation. I strongly believe that
Lynch's concept has a lot going for it and explains an important, if
not the central, aspect of these constraints.
Author's reply (3)
I have removed most of my
criticism of Lynch and Conery paper because I agree that their data are
valid. However I disagree with their evolutionary interpretation, and
suggest another interpretation that large Ne was one of the constraint in the evolution of prokaryotes.
Another, complementary source of these constraints that is not at
all covered is the faster than linear scaling of the number of
regulatory genes with genome size (van Nimwegen E. Trends Genet. 2003
Sep;19(9):479–84; Konstantinidis KT, Tiedje JM. Proc Natl Acad Sci U S
A. 2004 Mar 2;101(9):3160–5).
Author's reply (4)
I agree that the proportion of
regulatory genes may change in evolution. However I don't think that
this can substantially affect the regression line which I discuss in
the paper.
Another issue is that of the "minimal genome": equating minimal
genomes reconstructed by comparative-genomic approaches with ancestral
life forms is incorrect and does not reflect the original view of the
authors of the minimal genome notion (of which ref. 27 in the present
manuscript is a proper reflection).
Author's reply (5)
I have removed the reference to
the "minimal genome" paper in the paragraphs where I discuss the
complexity of ancestral life forms and the possibility of spontaneous
self-assembly of complex systems.
Again, all this is not to claim, with confidence, that the only form
of life we are aware of evolved on earth rather than elsewhere in the
universe. The latter is quite a possibility. The only claim I am making
is that the data analyzed in this paper and, for that matter, any
comparative-genomic data I can think of do not provide any evidence in
support of an early, extraterrestrial origin of life. Accordingly, I
believe that terrestrial origin around 4 billion years ago should be
taken as the null hypothesis.
Author's reply (6)
I do not claim to have a proof
for the exponential hypothesis, but offer available supporting
evidence. In addition, I suggest (a) mechanisms of positive feedback
that can cause the exponential increase in genome complexity and (b)
possible test for panspermia if life is found on any planets or
satellites in the solar system. Testing multiple null hypotheses may
appear more productive than testing a single one.
Reviewer's report 2
Chris Adami, Keck Graduate Institute, California Institute of Technology, Pasadena, USA
In
this contribution, the author attempts to characterize the functional
form of the relationship between the sizes of the functional genome of
organisms and their appearance in the fossil record. Using five data
points (prokaryotes, eukaryotes, worms, fish, and mammals), the author
deduces an exponential increase in functional size with time. He then
uses this functional relationship to hypothesize an origin of life that
exceeds the age of the Earth by a factor of two. From this he concludes
that the origin of life cannot have taken place on Earth, but points
towards hypotheses of the panspermia type.
This paper is an example of how not to analyze data. First, there is
no doubt that a much more sophisticated analysis of whole genome data
can be performed. For example, the author claims that 1/3 of the Fugu
rubripes genome is functional (this is one of his datapoints), but the
original publication only states that "gene loci occupy about one-third
of the genome". There is some evidence that non-coding but functional
(likely regulatory) DNA increases with the complexity of the organism
(see, e.g., [1]), so that taking just the gene loci into account is very likely to be misleading, more so for complex metazoans.
Author's reply (7)
I believe that my estimate of functional genome size of Fugu rubripes as
1/3 of genome is realistic. Gene loci contain more than coding
sequence; they also include introns and untranslated regions. Although
I did not explicitly include promoter sequences, they may be of similar
size as non-functional portion of introns. This analysis is not
sensitive to small variation in functional non-redundant genome size (±
20–30%). This level of uncertainty is inevitable because we do not have
an exact quantitative measure on genome complexity.
Even were we to accept the five data points at face value, they
would not allow us to reach any conclusion about the origin of life.
This is a classical case of "allowing the data to suggest a model". For
example, I have a time series of personal Marathon finishing times
versus date that very much suggests a linear (decreasing) relationship
(with four, rather than five, data points). But I am not so foolish as
to predict from these data points the date when I will break the world
record (or the speed of sound, or light, for that matter). The authors
advance some arguments for their exponential model, but many more
arguments speak against it. For example, while an approximately
exponential growth could be argued for in any particular period, major
changes in organization (for example from unicellular to multicellular)
are likely to affect the rate of growth, so that a piecewise
exponential would be a more reasonable assumption.
Author's reply (8)
see reply #1 to Eugene Koonin
Even more dramatic, it is inconceivable that life began with just a
few nucleotides. Instead, there must have been an initial step–from
zero to finite–in the complexity of organisms (as measured by its
functional genome). The size of this step will then be crucial in
determining the point of origin.
Author's reply (9)
I have added more discussion on
why it is more likely that genome evolved gradually from single coding
elements (paragraphs 5–7 of Discussion).
But as we have no information about the minimal genome size of
living organisms, an extrapolation with a pure exponential simply makes
no sense. Thus, while a thorough analysis of the evolution of
functional genome size would certainly be welcome, the data presented
here do not warrant any conclusion, except perhaps that the size of
functional DNA has been increasing in evolution, something we should
not be terribly surprised to learn.
Reviewer's report 3
Arcady Mushegian, Stowers Institute, Kansas City, USA
I agree with the Author on the following:
1. If there is evidence supporting panspermy, it should be considered seriously.
2. Panspermy, if it occurred, should not prevent us from attempting
to reconstruct ancestral genomes, using comparative genomics and the
knowledge of planetary chemistry.
3. Early stages of evolution of Life seem to have been overloaded
with evolutionary innovation, which asks for explanations. Panspermy
may be one such explanation; periods of accelerated evolution, prompted
in part by Lynch-Conery considerations of Ne, is another.
Having said that, I do not see any striking arguments for panspermy
in this work. The "genome size as a clock" approach is, in my opinion,
qualitatively correct, and it shows what we already knew, i.e., that
the earliest stages of life appear to have had precious little time to
progress to what are currently our best estimates of genome size and
the number of protein-coding genes (on the latter, see also below).
Whether the dependency is of the exponential form, however, remains to
be seen.
Author's reply (10)
see reply #6 to Eugene Koonin
Discussion of minimal genome in this regard is a red herring. First,
the Author misreads what is in the minimal-genome literature (e.g.
Mushegian and Koonin, 1997; later reviews both by myself and by Koonin;
and experimental work of Hutchison, Smith and Venter, most recently
Glass et al., 2006; Pubmed 16407165). Minimal genome is a construct of
biochemical engineering, predicted or directly manipulated to sustain
life in a rich medium with the smallest number of genes. It is not
purported to model the ancestor, even though it, same as the ancestral
genomes, may be constructed using methods of comparative genomics, and
even though minimal genome may be enriched in ancestral genes. Second,
no one ever said that the minimal or ancestral genomes have evolved by
spurious assembly of 300 genes – any paper, including our own, that
speculates about origins of Life, understands the problem of earlier
stages clearly.
Author's reply (11)
see reply #5 to Eugene Koonin
Third and most important, all this is not relevant to Author's own
argument: the genome to discuss is not minimal one, but that of LUCA
(last universal common ancestor). The latest reconstructions of LUCA
gene content, notably Pubmed 12515582 and 16431085, come up with
600–1000 genes, which is in fact even better for the early-overload
argument, so why not stick to these estimates?
Author's reply (12)
In this paper I used existing
genomes, and LUCA is only mentioned for discussion purposes. Also I
tried to make my estimates for predicted life origin as conservative as
possible.
Ultimately, the question is not whether "early genomes were way too
complex", but, in the likely case that they were, whether panspermy
better explains these observations than other hypotheses. I find it
counterproductive to dismiss the Lynch-Conery theory in one sentence –
at least in the sentence that directs to the Charlesworth-Barton paper,
as if it is the last word on the subject. In fact, said paper is rather
supportive of many observations and explanations presented by Lynch and
Conery, arguing mostly with the idea of subfunctionalization (where
Charlesworth's argument is an overly general one, which is
understandable: coming up with any specifics here will require a lot of
quite subtle analysis of the data that are not there yet) and, in a
technically involved way, with the ideas of transposon dynamics (which,
I think, are addressed in part by M.Lynch in Pubmed 16280547). If the
author has a substantive disagreement with Lynch-Conery, let us hear
it, but we haven't yet.
Author's reply (13)
see reply #3 to Eugene Koonin
The "viral hypothesis", in the meantime, exists in many
modifications, not all of which require modern-type viruses: see for
example, Woese (series of essays in 1998–2002) and Koonin-Martin
(Pubmed 16223546). With regards to absolute time scale, however, these
theories may not be even that helpful, because the step from these
general hypotheses to constant vs variable evolutionary rate would not
be trivial.
Author's reply (14)
Even if early viruses were
different (e.g., non-parasitic) there is no evidence that their rate of
complexity increase was higher than in eukaryotes.