Amino acid sequences from 41 completely sequenced prokaryotic genomes were extracted from the Genome division of the Entrez retrieval system  and used as the master species set for this analysis. Bacterial species abbreviations: Aquifex aeolicus (Aae), Bacillus halodurans (Bha), Bacillus subtilis (Bsu), Streptococcus pyogenes (Spy), Staphylococcus aureus (Sau), Clostridium acetobutylicum (Cac), Borrelia burgdorferi (Bbu), Campylobacter jejunii (Cje), Chlamydia trachomatis (Ctr), Chlamydophila pneumoniae (Cpn), Deinococcus radiodurans (Dra), Escherichia coli (Eco), Haemophilus influenzae (Hin), Helicobacter pylori (Hpy), Lactococcus lactis (Lla), Mesorhizobium loti (Mlo), Mycoplasma genitalium (Mge), Mycoplasma pneumoniae (Mpn), Mycobacterium tuberculosis (Mtu), Mycobacterium leprae (Mle), Pasteurella multocida (Pmu), Neisseria meningitidis (Nme), Pseudomonas aeruginosa (Pae), Rickettsia prowazekii (Rpr), Rickettsia conorii (Rco), Synechocystis PCC6803 (Ssp), Thermotoga maritima (Tma), Treponema pallidum (Tpa), Vibrio cholerae (Vch), Xylella fastidiosa (Xfa), Buchnera sp. (Bsp), Caulobacter crescentus (Ccr), and Ureaplasma urealyticum (Uur). Archaeal species abbreviations: Aeropyrum pernix (Ape), Archaeoglobus fulgidus (Afu), Halobacterium sp. (Hsp), Methanothermobacter thermoautotrophicum (Mth), Methanococcus jannaschii (Mja), Pyrococcus horikoshii (Pho), Pyrococcus abyssi (Pab), Thermoplasma volcanium (Tvo), Thermoplasma acidophilum (Tac), Sulfolobus solfataricus (Sso). In addition, the following species were included in the case studies described in the text; bacteria: Agrobacterium tumefaciens (Atu), Bifidobacterium longum (Blo), Brucella melitensis (Rso), Chlorobium tepidum (Cte), Enterococcus faecalis (Efa), Fusobacterium nucleatum (Fnu), Lactobacillus plantarum (Lpl), Leptospira interrogans serovar (Lint), Listeria innocua (Lin), Listeria monocytogenes (Lmo), Nitrosomonas europaea (Neu), Nostoc sp. (Nsp), Oceanobacillus iheyensis (Oih), Ralstonia solanacearum (Rso), Sinorhizobium meliloti (Sme), Streptomyces coelicolor (Sco), Thermoanaerobacter tengcongensis (Tte), Thermosynechococcus elongatus (Tel), Xanthomonas campestris (Xca), Shewanella oneidensis (Son); archaea: Methanopyrus kandleri (Mka), Methanosarcina acetivorans (Mac), Pyrobaculum aerophilum (Pae), Pyrococcus furiosus (Pfu).
Reconstruction of gene neighborhoods
Gene neighborhoods for the 41 compared genomes were reconstructed as previously described . Briefly, the collection of clusters of orthologous groups of proteins from complete genomes (COGs)  was used as the source of information on orthologous relationships for detecting conserved gene pairs. For the purpose of this analysis only 'highly conserved' gene pairs were considered, that is, those formed by genes from two COGs that were present in the same orientation and separated by less than three genes in at least 10 of the compared genomes. This conservative approach was adopted in order to ensure that all analyzed gene pairs belong to the same operon. At the next step, overlapping gene pairs were joined in triplets; each triplet was required to exist in at least one genome. Overlapping triplets were used to construct gene arrays by run search in an oriented graph; a gene array may or may not be found in its entirety in any available genome. Finally, gene arrays that shared at least three COGs were clustered into neighborhoods by using a single-linkage clustering algorithm . Conserved gene pairs that did not belong to the reconstructed gene arrays were also analyzed.
Searching for candidate horizontally transferred genes
The protein sequences encoded by the genes of each neighborhood were searched against the non-redundant protein sequence database (NCBI, NIH, Bethesda) using the BLASTP program. The BLAST hits were analyzed to identify their potential phylogenetic affinity. For each protein, the best hits were identified to the taxon to which the given species belongs (hereinafter, reference taxon) and to other major taxa; hits to closely related species were disregarded (see Table 1S in the additional data file). Proteins that had more significant (lower E-value) hits to a non-reference taxon than to the reference taxon were considered candidates for horizontal transfer and the respective orthologous protein clusters were subject to further phylogenetic analysis as described in the next section. If phylogenetic analysis indicated that a particular gene was likely to be horizontally transferred, phylogenetic trees were built also for the genes predicted to belong to the same operon. When different phylogenetic affinities were found for genes of the same predicted operon, this operon was considered to be 'mosaic'.
Multiple protein sequence alignments were constructed using the T-Coffee program  and positions containing >70% gaps were excluded. Distance trees were constructed by using the least-square method as implemented in the FITCH program of the PHYLIP package [30,31]. The least-square trees were subjected to maximum-likelihood local rearrangement using the ProtML program of the MOLPHY package, with the JTT-F model of amino acid substitutions [32,33]. The resulting trees are a surrogate for maximum-likelihood phylogenies; exhaustive maximum-likelihood tree construction is impractical for the number of species analyzed here. Bootstrap analysis was performed for each maximum-likelihood tree using the Resampling of Estimated Log-Likelihoods (RELL) method as implemented in MOLPHY [32-34]. Alternative placements of selected clades in maximum-likelihood trees were compared by using the rearrangement optimization (Kishino-Hasegawa) method as implemented in the ProtML program .