Patchy distribution in eukaryotes
In a previous study we identified diplomonad genes potentially derived from LGT . Here we have tested if alternative hypotheses and/or phylogenetic artefacts could account for these observations. We have broadened the eukaryotic taxon sampling of four genes with a limited distribution among eukaryotes, both by cloning and sequencing new genes and mining of the available sequence databases. By using this approach we are also able to refine the timing of putative LGT events with respect to organismal divergences and to gain insights into the evolution of gene families with a patchy distribution in general. All four genes were obtained from Entamoeba invadens, Entamoeba moshkovskii and Entamoeba terrapinae, and priS, fprA and nagB (partial) sequences were obtained from Mastigamoeba balamuthi with PCR using genomic DNA as template. Two T. vaginalis cDNA clones (nagB and fprA) were also completely sequenced. A T. vaginalis priS sequence and a M. balamuthi adhE sequence had appeared in the databases since the previous analysis. Furthermore, a N. gruberi priS cDNA clone was completely sequenced (see 1 for complete listing of the datasets). To further investigate the distribution of these genes in complete or nearly complete genome sequences, we performed similarity searches against available data from ongoing eukaryotic genome projects and retrieved the significant BlastP hits. We also combined these results with the information from published genomes and mapped the occurrences of the genes onto the current hypothesis of organismal relationships among eukaryotes (Figure 1) [29-31]. The four genes show a very patchy distribution often with both presences and absences within the same eukaryotic "super group". Two extreme alternative explanations may be invoked to explain these distribution pattern within eukaryotes; (i) presence of all four genes in the last common eukaryotic ancestor followed by many differential losses within the "super groups", or (ii) absence of the genes in the ancestor followed by independent gene acquisitions in all divergent lineages that possess the genes. The duplication and gene loss scenario becomes less likely the more independent convergent gene loss events need to be postulated. Therefore, phylogenetic analyses of the individual genes should help to distinguish between these hypotheses.
Additional File 1. Lists accession numbers, keys to short names used in alignment files, taxonomic descriptions, and the basis for exclusion from the phylogenetic analyses for all datasets used in the study.
Format: PDF Size: 59KB Download file
In our previous study we excluded sequences that showed indications of a biased aa composition to reduce the impact of phylogenetic artefacts due to compositional heterogeneity where possible – the available methods at the time assumed aa compositional homogeneity . Here we approach this potential problem by including analyses with methods and models that are designed to mitigate the potential misleading effects of compositional heterogeneity. Each aa in the alignments was recoded to the six groups of chemically related aa that commonly replace one another [32,33], an approach identical to the recent analyses of the NuoF protein . Previously, we were also limited by the size of the datasets , since the maximum likelihood (ML) methods were very computationally demanding at the time. The release of the PHYML software solves this problem since it is able to perform bootstrap analyses of a large number of sequences (>100) in a reasonable computational time . The recently released ModelGenerator software also ensures the usage of the optimal available model for aa substitutions in the ML analyses . These advances in the field of phylogenetics enabled us to perform more detailed analyses that include all available members of each gene family. Information about the datasets and parameters for the phylogenetic analyses are listed in 2, and the phylogenetic trees with support values from the two methods are shown in Figures 2, 3, 4, 5, 6.
Additional File 2. Information about the datasets and parameters of the phylogenetic analyses.
Format: PDF Size: 84KB Download file
This file can be viewed with: Adobe Acrobat Reader
All datasets in the phylogenetic analyses with grouped aa using the Metropolis-coupled Markov Chain Monte Carlo (MCMC) strategy showed convergence, indicated by good agreements between the split support values of the duplicate runs (3). Two of the alignments (the long version of glucosamine-6-phosphate isomerase and the prismane protein) also showed a good model composition fit indicated by both posterior predictive simulations and tests for homogeneity using X2 statistics and simulations to get the null distribution (pt > 0.05 and Psim > 0.05, respectively) , while the original datasets did not (Psim < 0.05) (2). This indicates that the recoding procedure has reduced the potential misleading effects of compositional heterogeneity in these two analyses. The other three datasets (A-type flavoprotein, the short version of glucosamine-6-phosphate isomerase, and alcohol dehydrogenase E) showed low pt and Psim values (<0.05), suggesting that compositional heterogeneity might still represent a source of artefactual results in these datasets (2). Nevertheless, none of these grouped aa datasets failed the tests of the model composition when the χ2 curve was used to get the null distribution, while two of the these three original datasets did, suggesting that the recoding procedure had improved the model fit (2), reducing the potential for estimation biases. At the very least, these analyses complement the more "standard" ML analyses by showing what aspects of the phylogenies are robust to aa recoding and reducing any potential effects of saturation.
Additional File 3. Figures showing the split support for the two runs in the grouped aa analyses plotted one against the other as indicators of convergence.
Format: PDF Size: 332KB Download file
This file can be viewed with: Adobe Acrobat Reader
The updated phylogenetic analyses show sequences highly scrambled with respect to expected organismal relationships (Figures 2, 3, 4, 5, 6), as previously observed for these genes . Thus, the earlier finding that these proteins produce phylogenetic trees that are incompatible with organismal phylogenies is robust with respect to improved taxon sampling and more detailed phylogenetic analyses – the number of eukaryotic groups (polyphyly) in the trees have increased, rather than decreased. In all analyses the eukaryotic sequences are found in at least two distinct regions of the trees nested with prokaryotic sequences (A and B boxes in Figures 2, 3, 4 &6), which are separated with strong support values. These strong separations could, in principle, be due to ancient duplication events followed by a large number of differential losses. Indeed, the presence of the same prokaryotic group in both regions in several of the phylogenetic analyses – low G+C Gram positives are for example found in both box A and B in Figures 2 and 6 – superficially supports ancient duplications. Such scenarios are expected to result in phylogenetic relationships for each paralog that mirror the organismal relationships. This is not observed in our analyses (Figures 2, 3, 4 &6). Furthermore, duplication and loss scenarios require that the gene was present in multiple copies in the last common universal ancestor and retained for a long evolutionary time. Thus, to explain the patterns we observe in the phylogenies (Figures 2, 3, 4, 5, 6) a eukaryotic ancestral genome that encoded a larger number of distantly related paralogs of the four genes than present in any of the extant eukaryote genomes would have to be inferred (Figure 1). To our knowledge, no data exist supporting a universal trend for drastic genome shrinkage in a relative recent evolutionary time. Therefore, gene duplication and differential losses alone do not seem sufficient to explain the unexpected phylogenetic relationships observed in our analyses (Figures 2, 3, 4, 5, 6).
However, the number of independent gene losses has to be weighed against the possibility of a later introduction of the genes into eukaryotes by LGT events. Yet, as none of the eukaryotic groups are found nested within a natural prokaryotic group with strong bootstrap support (Figures 2, 3, 4, 5, 6), it is difficult to identify donor and recipient lineages involved in the putative LGT events. Thus, the presence of these genes in a subset of the sampled eukaryotes are neither easily explained by vertical inheritance of the genes from the common ancestor of all eukaryotes, nor by a distinct number of easily identified gene transfer events. These phylogenies need to be carefully interpreted in combination with analysis of gene distribution patterns, as well as in the context of the biology of the available organisms.
Genes for the hybrid-cluster protein (priS) have been identified from a large number of prokaryotes, as well as several eukaryotes (Figures 1 &2). However, the cellular function of the protein is not well established; potential roles in the biological nitrogen cycle [37,38] and the adaptive response to oxidative stress  have been suggested. Although the gene is found in all three domains of life, its distribution within the domains are patchy; for example, it is relatively widespread among proteobacterial genomes, while it has only been found in a single high G+C Gram positive species and a single cyanobacterium (Figure 2). The occurrence of the gene in a large number of unrelated lineages in combination with the absence from more closely related species is most simply explained by cross-species transmission via gene transfer. Indeed, the phylogeny of the hybrid-cluster protein strongly suggests a number of intra- and inter-domain prokaryotic LGT events, with sequences from organismal groups such as proteobacteria, low G+C Gram positives, and euryarchaeota branching in several distinct regions of the tree, often branching with unrelated lineages with strong support values (Figure 2). The eukaryotes are found within two large groups of sequences including both archaeal and bacterial homologs, separated by a long and strongly supported branch (box A and B in Figure 2). One clade contains two Trichomonas sequences that are the sole eukaryotes in one of these groups (box B in Figure 2). A prokaryote-to-eukaryote LGT event affecting the parabasalid lineage after the divergence from other eukaryotes, including diplomonads, is a more parsimonious explanation for the position of the T. vaginalis sequences than loss of this version of the gene in all other eukaryotic species, provided T. vaginalis is not basal to all the other eukaryotes included in this analysis [18,25-27]. However, the prokaryotic donor lineage for the T. vaginalis sequences is difficult to determine from the current data and analyses.
The eukaryotic sequences found in the second group within the hybrid-cluster protein phylogeny are found in four polyphyletic groups (box A in Figure 2). However, only one of these groups is separated from the other with a significant statistical support; the Entamoeba sequences form a weakly supported group with a cyanobacterial sequence which are found as a sister group of three δ-proteobacterial sequences with a posterior probability of 1.00 in the grouped aa analysis. This is suggestive of a eubacteria-to-Entamoeba LGT event, perhaps with cyanobacteria or δ-proteobacteria as the donor lineage (Figure 2). Thus, at the very least the phylogeny of the hybrid-cluster protein suggests two transfer events from prokaryotic donors into protists. Taken at face value, the tree also supports additional transfers into various protist lineages. Indeed, the diplomonad lineages are nested within proteobacterial sequences in both the ML and grouped aa Bayesian analyses, although with weak statistical support in both cases. At any rate, this observation is suggestive of a LGT event from a proteobacterium to the diplomonad lineage. Two α-proteobacterial sequences are nested within the other eukaryotic priS sequences in box A with weak bootstrap support (Figure 2), which could indicate an origin via endosymbiotic gene transfer. However, the absence of the gene in mitochondrial genomes in combination with its absence from the nuclear genome of most eukaryotes related to pelobionts, diatoms, heterolobosea, and green algae (Figure 1), makes such an origin doubtful. Still, the weakly supported separation of diplomonad and these eukaryotic sequences may be artefactual – in reality they could represent a monophyletic group that inherited this gene from their common ancestor. If so, at least eight independent losses of priS in the apicomplexan/ciliate, oomycete, land plant, parabasalid, kinetoplastid, opisthokont, mycetozoan, and Entamoeba lineages would have to be invoked (Figure 1). Since such widespread and relatively recent independent losses appear unlikely, we favour a scenario where also the N. gruberi, M. balamuthi, T. pseudonana, and C. reinhardtii sequences have been distributed by an unknown number of gene transfer events from unsampled prokaryotic lineages or between microbial eukaryotic lineages. So far the priS gene has only been found in microbial eukaryotes. This circumstantially supports the hypothesis that the absence of a germ/soma separation in unicellular organisms increases their chance of acquiring genes by LGT . We predict that additional taxon sampling will confirm the current trend of preferential presence in unicellular eukaryotes and will further clarify the origins of the eukaryotic priS genes.
The fprA gene encodes A-type flavoprotein, a protein recently inferred to play a role in the detoxification of nitric oxide and/or oxygen in E. histolytica and was suggested to derive from a relatively recent LGT event from a prokaryotic donor . Furthermore, it has been demonstrated that T. vaginalis is able to degrade nitric oxide under microaerophilic conditions, an activity proposed to be associated with the presence of A-type flavoproteins in these parasites . Again, the gene is only found in a subset of the sequenced prokaryotes – mostly species able to grow in oxygen-poor environments (Figure 3). One exception is the widespread presence of the A-type flavoprotein within cyanobacteria, possibly indicating that the protein has evolved a different function within this group. Consistent with this hypothesis the cyanobacterial sequences are well separated from the other sequences in the tree and have a unique alignment feature; they all share a ~160 aa highly conserved C-terminal extension that is absent from all other sequences. The phylogenetic analyses of the A-type flavoprotein strongly indicate that the fprA gene has been distributed between the prokaryotic groups via LGT, rather than by vertical inheritance, since many groupings of unrelated prokaryotic taxa are observed and supported by strong support values from both analyses (Figure 3).
The eukaryotes are found in two clearly separated clusters. The two diplomonad sequences are found together with a Trichomonas sequence among a mixture of eubacterial and archaeal species, indicating a prokaryote-to-eukaryote gene transfer event to a hypothetical uniquely shared ancestor of diplomonads and parabasalids (box B in Figure 3) – unless several independent gene losses are inferred among a broad range of eukaryotic lineages (Figure 1). In the ML analysis, three additional Trichomonas homologs are found weakly associated with the strongly supported grouping of the Mastigamoeba and Entamoeba sequences (box A in Figure 3), while the Entamoeba/Mastigamoeba clade is found a sister clade to the Clostridium perfringens sequences in the grouped aa analysis with a posterior probability of 0.97 (data not shown). Thus, the relationships between amoebozoan, parabasalid and clostridial sequences are uncertain. However, Trichomonas does not share a recent common ancestor with amoebozoa (Figure 1) suggesting that the fprA gene has been acquired by separate gene transfer events in these two eukaryotic lineages. Alternatively, following a prokaryotic LGT to one of the two eukaryotic lineages, a second LGT took place between an ancestor of Entamoeba and a parabasalid. Interestingly, the three T. vaginalis sequences share a ~450 aa C-terminal extension of about 39% identity with the Clostridium perfringens 3 fprA homolog (Figure 3 and 4). This sequence, and a 433 aa long Clostridium tetani sequence (an FAD-dependent pyridine nucleotide-disulphide oxidoreductase:Rubredoxin-type 38% identical to the T. vaginalis sequences) are the most similar prokaryotic sequences in the public databases, while the most similar eukaryotic sequence, an NADH dehydrogenase from E. histolytica previously identified to be of prokaryotic origin , is only 25% identical (4). Such a taxonomic distribution of this protein domain links the Trichomonas C-terminal extensions with the Clostridium sequences rather than to the eukaryotic sequences, suggesting that they originated via a gene transfer event from a prokaryote donor. Thus, both the N-terminal and C-terminal domains of the T. vaginalis A-type flavoproteins likely have prokaryotic origins (see 4 for discussion of plausible scenarios).
Additional File 4. Figure showing the structural organization of the T. vaginalis A-type flavoproteins together with additional discussion.
Format: PDF Size: 413KB Download file
This file can be viewed with: Adobe Acrobat Reader
The eukaryotic lineages that encode fprA are micro-aerophilic organisms that most likely have evolved from aerobic eukaryotes , and the prokaryotes found closest to the eukaryotic sequences in the tree are found in oxygen-poor environments. These observations indicate that the transfer of the gene occurred in such an environment. The putative functional role of fprA in nitric oxide detoxification  indicate that these gene transfers might represent metabolic adaptations that allowed these different eukaryotes to better survive in anoxic environments. fprA could be part of the gene pool shared between distantly-related organisms (prokaryotic or eukaryotic) that occupy the same ecological niche.
The nagB gene encodes glucosamine-6-phosphate isomerase, an enzyme which is usually about 260 aa residues in length and is required for the biosynthesis of the cyst wall in Giardia . Apart from low G+C Gram positives and γ-proteobacteria, the nagB gene is only sparsely represented in eubacteria and not yet detected in archaea (Figure 4). It is also absent from several eukaryotic lineages (Figure 1). In the phylogenetic tree of nagB, a strongly supported group including the Entamoeba, ciliate, mycetozoa, pelobiont, parabasalid and several eubacterial sequences was detected (box B in Figure 4). All these sequences, with the exception of one of the Rhodopirellula baltica paralogs, have a roughly 500 aa residue long homologous C-terminal extension of the protein with pair-wise identities above 48%, which confirms the common ancestry of these sequences (Figure 4). To increase the resolution of this group, a separate analysis was performed which only included the sequences of the long version of the protein and therefore was based on a larger number of positions in the alignment – 560 unambiguously aligned aa residues compared to 229 (Figure 5). Interestingly, in both analyses, pelobiont and Entamoeba sequences form a group with the ciliate sequences. In the ML analyses the mycetozoan Dictyostelium discoideum is found as a sister to these sequences with a bootstrap support of 56% (Figure 5), while the Rhodopirelulla baltica 4 sequence is found as the immediate outgroup to the ciliate/Mastigamoeba/Entamoeba sequences with a posterior probability of 0.45 in the grouped aa analysis (data not shown). These weakly supported and partly incongruent phylogenies could be rationalized in the following ways: (i) a phylogenetic artefact splitting the amoebozoa sequences, in combination with differential gene loss in all sampled eukaryotic genomes with only the currently sampled ciliates and amoebozoa retaining nagB (Figure 1), (ii) inter-domain gene transfers events from closely related, but yet unsampled, prokaryotes to the amoebozoa and ciliate lineages, or (iii) the presence of the long version of nagB in the common amoebozoan ancestor followed by a transfer event to the ciliate lineage. Although none of the alternatives can be excluded, we favour the third explanation, since the expected topology within the amoebozoa is recovered, albeit with only weak bootstrap support from the ML analysis (Figure 5), if a single intra-domain LGT event is inferred. Furthermore, ciliates are known to eat other protists [43,44], indicating that a gene transfer event from an amoebozoan to a ciliate is feasible at least in principle . Further taxonomic sampling of eukaryotic and prokaryotic genomes is obviously needed, especially within the two particular eukaryotic groups concerned, to distinguish between the different plausible scenarios. In any case, the Mastigamoeba and Entamoeba sequences form a strongly supported group that indicates the presence of the gene in their common ancestor.
In contrast, the Trichomonas homolog, that encodes the long version of the enzyme, and diplomonads, that encode the short version, have distinct origins. While the parabasalid gene likely originated via a gene transfer event, possibly from a eubacteria within the Bacteroidetes/Chlorobi group (Figure 5), the source of the diplomonad genes remains uncertain. The separation from other eukaryotes in box A appear robust with strong bootstrap support from both of the analyses with all taxa (Figure 4), as well as an additional analysis where the box B sequences and prokaryotic long branches were excluded (5). Thus, the separation of the diplomonad sequences from the eukaryotic sequences in box A is unlikely a result of long-branch attraction; an LGT event from an unsampled eubacterial lineage seems like a more likely explanation (Figure 4).
Additional File 5. Figure showing a phylogenetic analysis of the short version of glucosamine-6-phosphate isomerase with the long version and some prokaryotic long branches excluded.
Format: PDF Size: 286KB Download file
This file can be viewed with: Adobe Acrobat Reader
The topology of the tree relating the opisthokont sequences to other eukaryotic lineages and prokaryotes is not easy to explain simply by vertical inheritance (box A in Figure 4); the metazoan sequences are grouped together as expected, but the fungi are split into one main group and a smaller group with three budding yeast sequences. The separation between the two fungal groups is supported by both analyses (Figure 4). In fact, the budding yeasts are found with the other fungi only in 1% of the bootstrap replicates in the ML analyses (including the analysis without long branches) and never among the 2000 sampled trees in the grouped aa analysis. Furthermore, the eukaryotes within box A are never found as a monophyletic group among the 500 bootstrap replicates in any of the ML analyses, and with a posterior probability of only 0.02 in the grouped aa analysis (data not shown). Collectively, these results indicate that the fungal nagB genes likely have separate origins; a recent introduction of a nagB gene into a common ancestor of the three budding yeast lineages Debaryomyces, Candida, and Yarrowia seems like a reasonable scenario.
The Dictyostelium sequence is found as a sister to a Fusobacterium sequence in the two ML analyses (Figure 4 and 5), while the sequence is nested between the three budding yeast sequences and the other sequences in box A in Figure 4 in the grouped aa analysis with posterior probabilities for the separations of 0.95 and 0.90, respectively (data not shown). The separation to the metazoan/fungi/euglenozoan group is strong also in the ML analysis; the Dictyostelium sequence indeed never branches with this group in any of the bootstrap replicates in the full analysis or the analysis where long branches were excluded (data not shown). Accordingly, the phylogenetic analyses indicate that a gene acquisition from a prokaryotic lineage is a plausible explanation for the origin of the Dictyostelium sequence, rather than a shared ancestry with the other eukaryotic sequences within box A.
Alcohol dehydrogenase E
Alcohol dehydrogenase E is a key enzyme in the energy metabolism of type I "amitochondriate" protists (i.e. those that lack energy-producing mitochondria or hydrogenosomes) , since it catalyzes the conversion of acetyl-CoA to ethanol in a two-step reaction which oxidizes two molecules of NADH to NAD+ . We expanded the dataset with adhE genes from three additional Entamoebaspecies. The failure to detect an adhE homolog in T. vaginalis in the ongoing genome project  was expected, since this organism contains hydrogenosomes (type II "amitochondrial" protist), and therefore utilizes a different set of enzymes in their energy metabolism . Interestingly, alcohol dehydrogenase E genes have been detected in the anaerobic chytrid fungus Piromyces sp. E2, which indeed does contain hydrogenosomes . However, energy metabolism of chytrids is clearly different from that of type II amitochondriate protists such as Trichomonas – chytrids exhibit a bacterial-type mixed-acid fermentation . The finding of alcohol dehydrogenase E in two green algal species, where the protein functions in aerobic mitochondria, indicates that the diversity of its functional role in eukaryotes is not fully understood .
The phylogenetic tree supports our earlier interpretations that LGT has played an important role in the evolution of this gene [10,49], with a number of strongly supported prokaryotic relationships that most easily are explained by gene transfer events (Figure 6) – the gene is only rarely found outside low G+C Gram positives and γ-proteobacteria. The additional Entamoeba sequences form a group with the E. histolytica sequence, indicating the presence of the gene in the common ancestor of these Entamoeba species, while the Mastigamoeba sequence clearly has a distinct origin from that of its amoebozoan sisters, as observed previously . The position of the Entamoeba sequences within a eubacterial group strongly suggests a prokaryote-to-eukaryote LGT event. Sequences from two diplomonads, two green algae, a single apicomplexan, and a chytrid fungus are found in the same region of the tree as the Mastigamoeba sequence (box B in Figure 6). However, relatives of these organisms are known to lack the gene (Figure 1), arguing in favour of recent independent introductions of the gene, rather than an ancestral presence followed by differential gene loss. The green algal sequences are found as sisters to the single cyanobacterial sequence (Figure 6) with moderate to strong statistical support from both analyses, indicating a transfer event between the lineages. This seems ecologically reasonable since ancestors of these lineages could have been found in the same environment. Although the observed topology could be explained by endosymbiotic gene transfer from the plastid, the fact that land plants are lacking the gene, the absence of the gene from extant plastid genomes, and the localization of the protein in green algal mitochondria  makes a gene transfer independent of the plastid endosymbiosis somewhat more likely. Also the Mastigamoeba sequence is separated from the other eukaryotic sequences in this region with strong and moderate support from the grouped aa and ML analyses, respectively (box B in Figure 6), clearly suggesting an origin via LGT, maybe from an unsampled prokaryotic lineage.
The diplomonad, fungal, and apicomplexan alcohol dehydrogenases are found in a weakly supported cluster in both analyses. This suggests eukaryote-to-eukaryote gene transfer events, although the donor and recipient lineages are difficult to infer. Indeed, it has earlier been suggested that the green algal adhE could have been acquired by the algae from parasitizing chytrid fungi or from foraminiferan hosts to endosymbiotic algae , and similar interactions between these lineages could be invoked to explain the exchange of adhE genes. Interestingly, Cryptosporidium, Piromyces and many diplomonads are all anaerobes or microaerophiles, and many share similar, if not identical, environments; the digestive tract of various mammals. This could have facilitated gene sharing via LGT between these distantly related eukaryotic lineages, although independent acquisitions from unsampled prokaryotes cannot be excluded. Interestingly, these three distantly related eukaryotic lineages most likely have adapted to an anaerobic lifestyle independently , and the putative acquisition of adhElikely represented independent metabolic adaptations to this environment. As for the priS gene (Figure 2), our prediction is that future genome sampling will only uncover adhE genes among microbial taxa since the distribution of adhE is restricted to microbial eukaryotes.
LGT events as phylogenetic markers
LGT is usually expected to confound efforts to reconstruct organismal relationships, since it decouples the historical signals in the gene sequences from organismal lineages . However, gene transfer events can also be informative in a specific case; the shared possession of a transferred gene may indicate a phylogenetic relationship between the lineages that possess the transferred gene to the exclusion of the lineages that lack it. There certainly are limitations for such interpretations; the gene could have been lost in some of the descendants of the recipient lineage and additional transfers can complicate the correct identification of donor and recipient lineages. In any case, gene transfer events are a potentially very important source of information about organismal relationships [18,51], especially for protists where the molecular data are scarce and phylogenetic reconstructions are difficult .
For example, the phylogenetic positions of pelobionts and Entamoeba have been difficult to resolve with molecular markers. Analyses of ribosomal RNA only weakly grouped these together , while more recently, based on a number of protein markers, it was conclusively shown that these two groups share a common ancestor to the exclusion of other eukaryotes [23,24]. Interestingly, the Entamoeba sequences strongly group together with the Mastigamoeba sequence in two of the four analyses discussed here, fprA and nagB (Figures 3 &5). This suggests that these genes were present in the ancestor of Mastigamoeba and Entamoeba, providing further support for a specific relationship between these two eukaryotic lineages. Furthermore, the higher eukaryotic taxon Amoebozoa [23,24] is reflected in the phylogeny of one of these genes, nagB (Figure 5), provided one accepts the recovered gene phylogeny that indicates a possible gene transfer from within this group to a ciliate lineage. If robust, this branching pattern could allow one to make inferences about the relative timing of divergences within Amoebozoa and the Ciliophora. However, improved taxonomic sampling of the nagB gene within both groups of protists will be needed to solidify such inferences. Finally, the absence of fprA in the Dictyostelium discoideum genome suggests that the presence of this eubacterial gene within various Amoebozoa lineages might be used as a synapomorphy for discerning phylogenetic relationships within the group.
Similarly, diplomonads and parabasalids have been suggested to share a common ancestor, initially mainly based on weak evidence from molecular data [53,54]. The case for this relationship was recently strengthened by the identification of two aminoacyl-tRNA synthetase genes that appear to have been transferred to a common ancestor of the two lineages  and recent phylogenetic analyses of concatenated protein alignments [25-27]. The observation of a transfer of a gene encoding A-type flavoprotein to a uniquely shared ancestor of the two lineages (Figure 3) further supports their specific relationship. The identification of these three genes of prokaryotic origin shared between diplomonads and parabasalids in other lineages within Excavata should be useful to pinpoint relationships within this poorly resolved and diverse group of eukaryotes.
The timing of transfers relative to eukaryote diversification
The relative timing of the transfers can be addressed in more detail with our increased taxon sampling (Figure 7). For reasons outlined above, probably none of the four genes was present in the last common eukaryotic ancestor indicating that all putative transfers almost certainly happened in a more recent evolutionary time (Figure 7). However, all four genes were transferred to the diplomonad lineage before the split between Giardia and Spironucleus – they branch together in the phylogenetic reconstructions (Figures 2, 3, 4 &6). With the sampling of Trichomonas homologs for three of the genes and the absence of the fourth, we now can date the transfers of priS, nagB, and adhE to after the split between diplomonads and parabasalids [18,25-27], but before the divergence of the two major groups of diplomonads  (Figure 7). The fourth gene (fprA) was most likely introduced into the diplomonads lineage before the split of parabasalids, but after their divergence to the other eukaryotic lineages.
Similarly, the putative transfers previously found to be affecting the Entamoeba lineage  can be dated in more detail with our updated datasets. The Entamoeba sequences branch together in the phylogenetic reconstructions for all genes, indicating that the genes were present in the common ancestor of the four species included in the analysis (Figures 2, 3, 4, 5, 6). For the other two genes, priS and adhE, the separation of the Entamoeba and Mastigamoeba sequences is strongly supported (Figures 2 &6), indicating that the transfer event to the Entamoeba lineage probably happened after the split between Entamoeba and pelobionts, but before the divergence of the Entamoeba species . A similar pattern was observed for the gene encoding alanyl-tRNA synthetase, where the ancestral eukaryotic version was replaced by a homolog from the parabasalid lineage in Entamoeba after the split from the Mastigamoeba . The timing of the transfer of the priS gene is more difficult to pinpoint since the separation of the Entamoeba and Mastigamoeba sequences is only weakly supported by the bootstrap analyses (Figure 2). As mentioned above, nagB and fprA most likely were present in the common ancestor of Mastigamoeba and Entamoeba, and the recipient of the nagB gene most likely was a common ancestor also of Dictyostelium, indicating that the transfer of these genes probably were more ancient events than the transfers of priS and adhE (Figure 7). The multiple copies found in one or more of the Entamoeba species for priS, and fprA are most likely due to recent gene duplication events within the Entamoeba lineages (Figures 2 &3), a pattern also observed from the analysis of the partial genome sequence of E. invadens. Interestingly, one of the two E. invadens priS sequences has a frameshift due to an eight nucleotide long deletion in the middle of the gene (this is unlikely to be due to a methodological artefact as several different PCR products gave identical sequences). This frameshift probably reflects the dynamics of the evolution of gene families in the Entamoeba lineage with frequent gene duplication followed by inactivation of some of the paralogs by accumulation of deleterious mutations.
Among the four sampled genes, the absence of gene transfer occurring within the Entamoeba and diplomonad groups is probably an indication that the evolutionary times since the split of these respective groups are short in comparison to the time since the last common eukaryotic ancestor, rather than an indication that the rate of inter-domain transfers have decreased in more recent evolutionary time. Indeed, one of the fifteen genes in the previous analysis, pyrG which encodes CTP synthetase, was probably introduced independently into the Giardia and Spironucleus lineages . However, the data are still scarce, and additional sampling of genes from diverse protist lineages could change the inferences of the timing of the transfers of individual genes presented here (Figure 7). Nevertheless, the data from our four genes, in combination with previously published data [10,18], support a scenario where prokaryotic genes from various lineages have been transferred into eukaryotic lineages continuously over time (Figure 7).
A link between gene transfers and feeding habits in phagotrophic protists and their shared ecological niche?
An interesting pattern where the studied protists mostly acquire genes from prokaryotes was observed (Figure 7). This may be explained by a preference for growing in prokaryote rich environments and consuming prokaryotes by the four groups of phagotrophic protists investigated in this study – diplomonads, parabasalids, pelobionts and Entamoeba, since uptake of DNA from ingested cells is possibly an important mechanism enabling LGT in eukaryotes [4,8]. Indeed, diplomonads generally feed on prokaryotes  and several prokaryote-to-eukaryote gene transfer events have been described for this eukaryotic group (Figure 7) [10,41,58,59], while, to our knowledge, no strong case of gene transfer event from a eukaryote lineage to diplomonads has been described yet. Entamoeba, on the other hand, is able to ingest both prokaryotes and eukaryotes; it can be maintained in monoxenic cultures with bacteria as well as trypanosomatid flagellates [60,61]. The Entamoeba lineage was recently suggested as the recipient lineage in a eukaryote-to-eukaryote gene transfer event of the alanyl-tRNA synthetase gene from the parabasalid lineage  (Figure 7), although most gene transfer events affecting Entamoeba seem to involve prokaryotic donor lineages (Figures 2, 3, 4, 5, 6) [5,10].
In this study, the donor and recipient lineages could be inferred in one putative eukaryote-to-eukaryote gene transfer event with reasonable support; ciliates were hypothesized to have acquired a gene from an Amoebozoa lineage (Figures 5 and 7). Interestingly, ciliates were also previously shown to represent the recipient lineage in an intra-domain transfer of the alanyl-tRNA synthetase gene  (Figure 7). It is possible that the recipient lineage of these two transfers – an ancestor of Paramecium and Tetrahymena – tended to preferentially graze on eukaryotic protists rather than bacteria and was therefore exposed to eukaryotic DNA leading to LGT events; ciliates are indeed known to eat both prokaryotes and eukaryotes [43,44]. Similarly, dinoflagellates are known to graze on eukaryotes  and have been identified as the recipient lineage in eukaryote-to-eukaryote gene transfer events [63,64]. If this pattern holds up in light of more data, it suggests that there is a link between the genome evolution and the food content in phagotrophic protists – indicating that an understanding of eating habits is important to our understanding of gene transfer in the evolution of phagotrophic protists, as postulated by Doolittle . Global proteome phylogenies from 144 prokaryotes indicate that LGT has created pools of shared genes between distantly related prokaryotes occupying the same niche , such as mammalian mucosa. Our present analyses extend these observations to microbial eukaryotes with shared genes between microorganisms thriving on mammalian mucosa such as the trichomonads, diplomonads, apicomplexans, Piromyces and Entamoeba.