During the past five years a number of reports have appeared indicating that protists acquire genes via LGT [1-4]. Recently, phylogenomic analyses of the complete genome sequences of Entamoeba histolytica and Cryptosporidium parvum indicated that several genes of these human parasites, including some key metabolic enzymes, most likely had been acquired from prokaryotes. 96 cases of relatively recent LGT from prokaryotic sources were reported for the former and 24 for the latter [5,6]. There are reasons to believe that LGT actually does influence protist genome evolution, since foreign genetic material is constantly entering the cell via food organisms. In addition, many protists harbour prokaryotes or eukaryotes (such as those that gave rise to secondary and tertiary plastids ) as endosymbionts. As a result, the occasional incorporation of genes from engulfed cells into the nucleus may facilitate a process of directional transfer of genes from the food organisms to phagotrophic eukaryotes over evolutionary time [4,8]. There is a growing amount of data that are consistent with this hypothesis. For instance, LGT has mostly been detected in phagotrophic lineages [4,9]. Moreover, the introduced genes in these lineages seem to have originated from organisms sharing the same environment with the recipient organisms – the anaerobic diplomonad lineage was found to have acquired genes from anaerobic prokaryotes in most cases , 22% of candidate donors lineages in LGT cases for Entamoeba histolytica involve relatives of the Bacteroides group which are abundant in human digestive tract , while the alga Bigelowiella natans has acquired genes mostly from other algae . These observations are consistent with the idea that physical proximity in the environment of the donor and recipient lineages may greatly enhance the probability of a successful gene transfer event , a notion recently supported by phylogenetic analyses of 144 prokaryotic proteomes identifying gene pools shared between organisms (including distantly related one) occupying the same ecological niche .
Most of the claims of LGT in protists are based on unexpected phylogenetic relationships between protist and prokaryotic sequences [2,4]. However, phylogenetic methods are susceptible to systematic error that could lead to false interpretations of transfer events . For example, a recent phylogenetic analysis indicated that the hydrogenosomal NuoF protein from Trichomonas vaginalis (a subunit of respiratory chain complex I) branched outside of a clade of mitochondrial homologs , leading the authors to propose a separate (non-mitochondrial) origin for this protein. However, these analyses failed to take into account the heterogeneity of amino acid (aa) composition displayed by sequences in this dataset . In contrast, when the dataset is analysed with methods designed to avoid this potential artefact, the T. vaginalis sequence branched within the mitochondrial cluster , in agreement with the well-supported hypothesis that Trichomonas hydrogenosomes share an evolutionary origin with mitochondria [15,16]. Similarly, Cpn60 phylogenies with different taxonomic samplings led to important differences in the phylogenetic relationships amongst anaerobic protists including E. histolytica and two diplomonads (Giardia and Spironucleus) eliminating the possibility of an LGT event between Entamoeba and Giardia lineages[2,17]. In both these cases, extreme divergence coupled with compositional biases in these sequences suggested, correctly, their unexpected branching patterns were due to phylogenetic artefacts. In contrast, the phylogenetic analyses of the alanyl and prolyl tRNA synthetases show the expected phylogenetic relationships amongst prokaryotes and eukaryotes with the exception that several protist sequences were found nested within Archaea as sisters to the Nanoarchaeota sequences. In this case, the observations could not be attributed to any known phylogenetic artefacts and were most easily explained in  as gene transfer events from the archaeal lineage to the protists.
Interpretations of phylogenetic analyses of proteins with a more patchy distribution in the tree of life are more challenging than the cases described above. For example, gene duplications followed by differential gene loss may also yield the unexpected phylogenetic relationships that are hallmarks of LGT. In addition, genes with a patchy distribution may only be present in one or a few lineages in each organismal group making it potentially more difficult to identify donor and recipient lineages of gene transfer events since such assignments require that recipient lineages are nested within the donor group. Fortunately, the number of complete genome sequences is steadily growing, and should clarify the patterns of gene distribution within the tree of life. In combination with thorough phylogenetic studies, analyses of the presence and absence of genes in completely sequenced genomes should be very able to differentiate putative cases of gene transfers in gene families with a patchy phylogenetic distribution from other scenarios .
To investigate whether phylogenetic artefacts, and/or unappreciated gene duplication and loss events, have influenced previous interpretations of LGT, we have broadened the taxon sampling of four gene families with patchy phylogenetic distributions, previously implicated in gene transfer events in diplomonads and E. histolytica . The updated datasets – priS (encoding a hybrid-cluster protein), fprA (A-type flavoprotein), nagB (glucosamine-6-phosphate isomerase), and adhE (alcohol dehydrogenase E) – were also analysed using more sophisticated phylogenetic methods. We have previously argued that these four genes were introduced into the genomes of diplomonads and Entamoeba from different sources based on phylogenetic analyses . If these previous observations were really indicative of LGT, increased sampling of eukaryotic and prokaryotic taxa should result in an equal or increased number of distinct eukaryotic groups in the phylogenetic analyses (i.e. eukaryotes would be polyphyletic) and stronger support for tree topologies consistent with LGT. Alternatively, a different pattern is expected if the interpretation of gene transfers were based on phylogenetic artefacts and/or differential losses. In the former case, increased taxonomic sampling should, if anything, provide evidence for a common ancestry for the diplomonad and Entamoeba sequences – as improved within-clade taxonomic sampling tends to improve phylogenetic accuracy  – reducing the number of independent eukaryote groups observed. Alternatively, if the 'polyphyletic eukaryotes' pattern was due to ancient duplications and poor paralog sampling, we would expect newly sampled sequences to cluster in the different eukaryotic clades and recover mirror eukaryotic phylogenies.
To test these alternative hypotheses, we focused our active sampling of taxa to relatives of Entamoeba; the amphizoic E. moshkovskii, the turtle parasite E. terrapinae, the snake parasite E. invadens[21,22], and the more distantly related free-living amoeboflagellate Mastigamoeba balamuthi [23,24], and a putative relative of diplomonads; the parabasalid Trichomonas vaginalis [18,25-27] (the cause of trichomoniasis, a sexually transmitted disease in humans ). In addition, we updated our datasets with all currently available homologous sequences in the public databases as well as from a number of ongoing genome sequencing projects of eukaryotes. Our updated phylogenies using more sophisticated models of aa substitutions in combination with analyses of the distribution pattern of the genes indicate that gene transfer hypotheses currently best explain the data.