The origin of spliceosomal introns – "extra" DNA sequences that disrupt the coding regions in nuclear genes of eukaryotes – is still a mystery. Since the evolution of introns is closely related to the evolution of eukaryotic genomes, understanding the origin of introns is vital for understanding the evolution of eukaryotes. There are currently two opposing theories of intron origin. The introns-early theory proposes that introns already existed at the progenote (i.e., the last common ancestor of prokaryotes and eukaryotes) to facilitate the construction of the first genes [1-4]. The introns-late theory, on the other hand, holds that genes at the progenote were intronless, similar to those in present-day prokaryotes, and introns were gained late, after the emergence of eukaryotes [5-7]. There has been no decisive resolution to the debate, and each of these theories has supporting arguments that have not been satisfactorily disproved.
Introns can be located in one of three phases: phase-0, -1, and -2 introns are defined as introns located before the first, after the first, and after the second nucleotide of a codon, respectively. The phase of an intron is conserved during evolution, because a variation in intron phase is possible only through simultaneous mutations that alter the 5' and 3' ends of the intron in a complementary manner . The distribution of intron phases is non-uniform: phase-0 introns occur most frequently and phase-2 introns occur least frequently [8-10].
The introns-early theory explains the non-uniform distribution by speculating that 35% of modern introns are ancient, i.e., existed at the progenote to facilitate the assembly of the first genes [4,11]. Since exons are remnants of primordial minigenes, most of these ancient introns must lie in phase-0, resulting in the current excess of phase-0 introns. However, this theory does not satisfactorily explain why phase-1 introns are more common than phase-2 introns. In contrast, the introns-late theory proposes that the nonuniformity of intron phase distribution may have arisen from nonrandom intron insertion . Introns have been proposed to be inserted only into a fixed sequence pattern, termed a "proto-splice site" . Several potential patterns for proto-splice sites have been proposed, for example MAG|R ; G|G, AG|G, AG|GT ; and MAG|GT [13,14]. (In these patterns, M is A or C, R is A or G, and the vertical line represents the intron insertion site.) However, there is still no clear evidence that the observed distributions of intron phase are caused by intron insertions [10,15].
In this paper, we tested the introns-early and introns-late theories using two independent approaches: (i) by inferring the evolution of intron phase distribution and (ii) by retesting whether intron phase distribution reflects the nonrandomness of intron insertions. The results show that there is a general trend over evolution toward increasing the preponderance of phase-0 introns and that the observed phase distribution of introns can be indeed explained by an intron insertion model. Consequently, our results seem to support the explanation provided by the introns-late theory for the nonuniformity of intron phase distribution.