Identification of horizontal gene transfer
Experimental data on operons in organisms other than E. coli and, to a lesser extent, B. subtilis are scarce. Therefore we used conserved gene pairs and connected gene neighborhoods associated with them as an approximation of operon organization of genes in other prokaryotic genomes. Several studies have suggested strongly that all gene pairs that are conserved in multiple genomes belong to the same operon [7,25,26]. Here we used an extremely conservative threshold (conservation of a gene pair in 10 genomes) to ensure that only genuine operons were analyzed. BLASTP searches for potential horizontal gene transfer identified 729 candidate genes (9% of all genes comprising conserved neighborhoods in 41 analyzed genomes), that is, genes whose encoded protein sequences were more similar to homologs from phylogenetically distant taxa than to those from the reference taxon (it might be worth noting that, throughout this analysis, we treated genes as atomic units and did not consider the relatively unlikely possibility of HGT for portions of genes). Phylogenetic analysis of these genes and their neighbors revealed different types of evolutionary events, some of which involve whole operons, whereas others seem to reflect operon mosaicity.
Probable horizontal transfer of whole operons or large portions of operons, when phylogenetic trees for all genes in a predicted operon had the same topology (which, however, was incompatible with the species tree) was identified in 35 neighborhoods - approximately one third of all analyzed neighborhoods. These events were classified into three categories: acquisition of a new (for the given lineage) operon, paralogous operon acquisition and xenologous operon displacement . Examples of all these classes of apparent operon transfer events are given in Table 1. These 35 neighborhoods generally represented functional classes of genes known to be prone to HGT: transporters, general metabolism-related genes and signal transduction systems [13,15,17]. This seems to be a relatively low level of horizontal transfer in view of the purported selfish behavior of operons [9,10]. However, the strict threshold, described above, on the detection of conserved gene pairs undoubtedly led to many horizontally transferred operons being missed. Thus, the present analysis gives a conservative low bound of operon transfer.
In addition, 19 predicted operons with different phylogenetic affinities of the constituent genes, that is, apparent mosaic operons, were identified (Table 2). Again, this is definitely a low bound - not only because of the high threshold set for the identification of conserved gene pairs, but also because this number includes only cases that were clearly resolved by phylogenetic tree analysis. In addition, we detected many uncertain cases where the different phylogenetic affinities of genes within an operon were not strongly supported (data not shown); at least some of these are probably also mosaic operons.
Below we describe in greater detail several case studies of putative mosaic operons; in each of these cases, in addition to the basic set of 41 species, we included in the analysis the apparent orthologs of the respective proteins from all prokaryotic species in which they were detected, in order to control for possible effects of taxon sampling. We found that, although the details of tree topology inevitably depended on the set of species analyzed, the conclusions regarding HGT were not affected by the inclusion of additional species.
Case studies of mosaic operons
Ribosomal protein L29 gene
In the previous study that prompted this work, we analyzed the phylogeny of several ribosomal proteins and found several cases of apparent horizontal transfer resulting in mosaic operon organization . Horizontal transfer "in the heart of the ribosome" also has been independently described by others [21,22]. Here we report another case of a ribosomal protein operon with apparent in situ gene displacement (that is, displacement without change of the local gene arrangement) via HGT. Figure 1a shows the highly conserved gene arrangement around the gene for the large subunit protein L29. The phylogenetic trees for the flanking L16 and S17 genes showed largely congruent topologies without any indications of HGT (Figure 1b,d). In contrast in the L29 tree, unexpected clustering is seen for Aquifex aeolicus and both Rickettsia: the Aquifex branch is within the archaeal cluster, whereas the Rickettsia group is with Chlamydia, rather than with the rest of alpha-proteobacteria: the taxon where Rickettsia belong (Figure 1c). In situ displacement is the most likely mechanism behind this observation given that the structure of this operon is conserved in the majority of bacteria. The nature of the selective advantages conferred by this gene substitution is unclear, but the apparent sources of the transferred genes suggest that the displacements indeed might be adaptive. Aquifex apparently acquired the L29 gene from archaea, which could be related to the adaptation to the hyperthermal conditions, whereas Rickettsia probably captured the gene from other parasitic bacteria, such as Chlamydia. However, these observations also allow a non-adaptationist interpretation, under which the apparent source of acquired genes simply reflects the increased likelihood of gene exchange between the respective organisms due to co-habitation, with chance fixation of some of the transferred genes.
The ruvB gene of Mycoplasma
The genes for Holliday junction resolvase subunits RuvA and RuvB form an operon that is conserved in most of the sequenced bacterial genomes (Figure 2a). In the phylogenetic trees for RuvA and RuvB, the branch that includes Ureaplasma and Mycoplasma occupies drastically different positions. In contrast to RuvA, which belongs to the Gram-positive clade as expected (Figure 2b), mycoplasmal RuvB clusters with the epsilon-proteobacteria (Helicobacter and Campylobacter) and the mycoplasma-epsilon-proteobacteria clade further joins alpha-proteobacteria (Figure 2c). This clustering is strongly supported by bootstrap analysis and was shown to be robust using statistical tests of tree topology (Table 3). Thus, the ruvB gene seems to have undergone xenologous displacement in situ after the divergence of the mycoplasmal branch from the rest of Gram-positive bacteria. Notably, the gene exchange seems to have occurred between phylogenetically distant parasitic bacteria.
Undecaprenyl pyrophosphate synthase gene in the lipid biosynthesis operon of Rickettsia
In Rickettsia, the undecaprenyl pyrophosphate synthase gene (uppS), which belongs to a highly conserved doublet of lipid biosynthesis genes embedded in functionally diverse operons (Figure 3a), clusters with an unexpected assemblage of bacterial orthologs, including those from the spirochete Treponema pallidum and Fusobacterium nucleatum, but not with the 'native' taxon, alpha-proteobacteria (Figure 3b,c). Statistical testing of the tree topology showed that clustering of rickettsial uppS with those from other alpha-proteobacteria is highly unlikely (Table 3). The apparent in situ gene displacement of the uppS gene in Rickettsia was accompanied by a breakdown of the operon into three fragments (Figure 3a). The topology of the uppS tree suggests the possibility of multiple HGT events, although only the rickettsial genomes show evidence of gene displacement in situ. The emergence of gene displacement in bacterial parasites is noted here again.
NADH:ubiquinone oxidoreductase subunits in Halobacterium sp
Gene organization in the NADH:ubiquinone oxidoreductase operon is highly conserved in all sequenced archaeal genomes and those of several groups of bacteria (Figure 4a). The nuoI gene of Halobacterium sp. shows an unexpected phylogenetic affinity with proteobacteria (Figure 4c), whereas the neighboring genes have the regular archaeal affinities (Figure 4b,d). The unusual phylogeny of halobacterial NuoI, which was strongly supported by statistical tests (Table 3), suggests in situ displacement by a proteobacterial gene. Notably, all three NADH:ubiquinone oxidoreductase subunits of the cyanobacteria unexpectedly grouped within the archaeal clusters of the respective trees (Figure 4b-d). These observations point to a complex history of HGT for the genes encoding all subunits of NADH:ubiquinone oxidoreductase.
Lipopolysaccharide biosynthesis operon in Methanothermobacter thermoautotrophicus and Deinococcus radiodurans
The genes of the lipopolysaccharide biosynthesis (rfbABCD) operon appear to have been extensively and independently shuffled in many prokaryotic genomes and might have undergone multiple horizontal transfers. This conclusion is supported both by examination of the operon organization (Figure 5a) and by phylogenetic tree analysis (Figure 5b-e). The trees showed a clear affinity between the rfbA, rfbB, rfbC genes of Methanothermobacter thermoautotrophicum and Clostridium acetobutylicum (Figure 5b-d), with Fusobacterium nucleatum and Listeria monocytogenes joining the cluster in the case of rfbB (Figure 5b), whereas M. thermoautotrophicum RfbD clustered with its archaeal orthologs as expected (Figure 5e). The genes of the rfbABCD operon in Methanothermobacter are shuffled compared to the probable ancestral order, which is found in many bacteria and C. acetobutylicum also shows a rearrangement (Figure 5a). One likely scenario in this case is that M. thermoautotrophicum acquired the rfbABCD operon with the typical gene order from a bacterium of the clostridial lineage, which was followed by displacement of three resident genes and loss of one of the invading genes, accompanied by operon rearrangement. An alternative scenario is that the rearrangement occurred in the source bacterium of the clostridial group and Methanothermobacter acquired only the rfbACB portion, which might have inserted head-to-tail downstream of the original operon, followed by elimination of the resident rfbABC (Figure 5a).
Another interesting case of mosaic structure of the same operon is seen in Deinococcus radiodurans (Figure 5a). Deinococcus RfbA shows clear affinity with proteobacteria (Figure 5d), whereas RfbD is of archaeal descent (Figure 5e), with RELL analysis revealing no competing topologies (Table 3). The remaining two genes of this operon in Deinococcus, rfbB (DRA0041) and rfbC (DRA0043), have uncertain phylogenetic affinities (Figure 5b,5c). Thus, as in the case of M. thermoautotrophicus, this operon in Deinococcus was apparently formed through at least two events of xenologous gene displacement in situ and gene shuffling.
Leucine/isoleucine biosynthesis operon
Perhaps the most prominent case of mosaic operon organization is the leucine/isoleucine biosynthesis operon of several bacteria and archaea, particularly Thermotoga maritima. This is the only known branched chain amino acid biosynthesis operon, and it is partly conserved in a wide range of bacteria (Figure 6a). Following initial indications from the analysis of taxon-specific BLAST hits, we constructed phylogenetic trees for each of the genes of this operon. Unlike other bacteria, Thermotoga has two leuA paralogs, which are adjacent in the operon. The proteins encoded by these paralogous genes show clearly distinct phylogenetic affinities: TM0552 belongs to a distinct clade within the archaeal domain, whereas TM0553 is part of a Gram-positive bacterial cluster (Figure 6b). This phylogenetic mosaic in Thermotoga extends further, with LeuB (TM0556) clustering with proteobacterial orthologs (Figure 6c), and LeuC (TM0554) and LeuD (TM0555) with archaeal orthologs (Figure 6d,e). All these affinities were strongly supported by two versions of bootstrap analysis (Table 3). The genes encoding LeuA, LeuC, and LeuD from Thermotoga, Clostridium, Aquifex and both Pyrococcus abyssi and P. furiosus belong to a well-defined clade, which also includes a medley of alpha-proteobacteria and cyanobacteria, within the archaeal domain in the respective trees (Figure 6b-e). Thus, this sub-operon apparently has been relatively recently horizontally spread among these organisms. Pyrococcus abyssi and P. furiosus probably acquired these genes after the divergence from the common ancestor with P. horikoshii because the latter has only the typical archaeal operon (Figure 6a).
Given the apparent propensity of Thermotoga (and other hyperthermophilic bacteria) for acquisition of archaeal genes via HGT, it seems most likely that the archaeal version of the leuACD suboperon originally entered the bacterial domain via Thermotoga or a related thermophilic bacterium. Formally, in Thermotoga these events could be classified as a combination of paralogous (sub)operon acquisition (TM0554-TM0555 in addition to another paralogous archaeal gene pair TM0291-TM0292) and xenologous gene displacements (genes TM0553, TM0556). In Clostridium, xenologous operon displacement seems to have occurred because the ancestral operon of the Gram-positive type apparently had been lost. The subsequent evolution of this operon in the four organisms proceeded along different paths. Aquifex has lost the operon structure even for the two subunits of 3-isopropylmalate dehydratase (LeuB, LeuD). Different genes in the operons of P. abyssi and C. acetobutylicum have been translocated and several genes probably have been independently accrued (Figure 6a). In both P. abyssi and Thermotoga, the original leuA and leuB genes within the leuABDC core seem to have been independently displaced by bacterial orthologs without a clear affinity with any specific bacterial lineage (Figure 6a). The most likely scenario for evolution of this operon in Thermotoga is that it originated as a Gram-positive type operon and subsequently many genes (or sub-operons) have been displaced in situ through multiple horizontal transfers and a few additional genes have been inserted into the preexisting structure. The alternative but less likely hypothesis involves independent, de novo operon assembly from genes of different phylogenetic affinities. Several other apparent HGT events were detected during the analysis of the phylogenetic trees for leucine biosynthesis genes (DR1614 in LeuD tree, DR1610 in LeuC tree (Figure 6d,e)) but, in these cases, the acquired genes do not belong to conserved operons.