Genome-wide identification of RNase H genes
To identify RNase HI, HII, and HIII coding sequences in a genome, two strategies (a remote homology search and a protein domain search) were applied to ensure maximum coverage of the genes (See Methods). Using the complete genomes for 326 strains from 235 bacterial species and 27 strains from 27 archaeal species, we retrieved 342 RNase HI genes, 333 RNase HII genes, and 76 RNase HIII genes (see Additional file 1). Almost all genomes contained one or more RNase H genes, and there was little difference in the types and numbers of RNase H genes among several strains of a given species, with the exception of Buchnera aphidicola and Xanthomonas campestris. The RNase HI-related gene of B. aphidicola str. APS (Acyrthosiphon pisum) contained a frameshift mutation that resulted in a loss of RNase H activity (Dr. Naoto Ohtani, Keio University, personal communication), whereas non-frameshifted RNase HI genes were identified in B. aphidicola str. Bp (Baizongia pistaciae) and B. aphidicola str. Sg (Schizaphis graminum). A frameshift mutation was also found in the RNase HII-related gene of Xanthomonas campestris pv. vesicatoria str. 85-10. In contrast, other strains of X. campestris pv. campestris (str. 8004 and str. ATCC 33913) had a non-frameshifted RNase HII. Therefore, we assumed that B. aphidicola had an RNase HI gene and X. campestris had an RNase HII gene at the species level. Accordingly, we counted the number of RNase H genes in the 27 archaeal and 235 bacterial species listed in Table 1. The RNase HI gene was present in 33% (9/27) of the archaeal species and 89% (210/235) of the bacterial species, the RNase HII gene was present in all archaeal species and in 94% (220/235) of the bacterial species, and the RNase HIII gene was identified in 4% (1/27) of the archaeal species and in 17% (40/235) of the bacterial species. This result is consistent with a previous report that RNase HII is the more universal gene in prokaryotes . Most species had a single copy of a given gene, but multiple genes encoding RNase HI were found in 11% (3/27) of the archaeal species and 16% (37/235) of the bacterial species.
Alteration of RNase H combinations in closely related species
Contrary to the situation in eukaryotes, in which both RNase H1 and RNase H2 tend to coexist, various combinations of RNase H genes have been found in prokaryotes . To compare the presence and absence of RNase H genes among prokaryotes, we examined the combinations of RNase H genes in individual genomes. Three types of RNase H genes can theoretically produce eight combinations of genes, as shown in the Venn diagram in Figure 1. Because, in practice, we found no species that lacked all three genes (Group H), all species were classified on the basis of the remaining seven RNase H combinations (Table 2). No prokaryotic genome contained the combination of only RNase HI and HIII (Group D); this supports the results of a previous study  at the genome-wide level. Although many archaeal species contained only the RNase HII gene (Group F) or a combination of RNase HI and HII genes (Group B) – a finding in agreement with previous reports [22,23] – one of the euryarchaeota, Methanosphaera stadtmanae DSM 3091, combined RNase HII with RNase HIII (Group C) instead. On the other hand, 189 of the 235 bacterial species (80%) had combinations of the RNase HI and HII genes (Group B) and 16 of the 235 species (7%) had combinations of the RNase HII and HIII genes (Group C). At the same time, the RNase H combinations in bacteria exhibited more variety than those in the archaea and seemed to differ even among related species, especially in the firmicutes. Interestingly, species that had all three RNase H genes (Group A) were limited to the firmicutes.
To elucidate the relationship between RNase HI and HIII, the evolutionary genomic constitution of the RNase H genes was examined in 49 species of firmicutes, because RNase HIII is especially common in this group (classes A, C, D, and G in Table 2). First, we constructed a Bayesian tree based on the nucleotide sequences of the DNA gyrase subunit B (gyrB) genes of the firmicutes, which have been used to infer phylogenetic relationships among prokaryotes , and displayed the RNase H combinations of each species (Figure 2). The results of our phylogenetic analysis indicate that RNase H combinations differed even among closely related species. For example, the species in the mollicutes were classified into Groups B (RNase HI and HII), C (RNase HII and HIII), and G (only RNase HIII), showing that the RNase HIII gene is not found in the mollicutes that retain the RNase HI gene. In addition, species with all three RNase H genes (Group A) were found only in the bacillales and lactobacillales, because this combination is not found in species other than firmicutes (see Table 2).
More noteworthy is the fact that the RNase HI genes of the species that also have RNase HII genes (Group B) often encode additional conserved protein domains, as represented by the presence of Group B' in Figure 2. This non-RNase H domain was first identified in the N-terminal portion of eukaryotic RNase H1  and was designated as a double-stranded RNA (dsRNA) and an RNA-DNA hybrid-binding domain (dsRHbd) because of its ability to bind to dsRNA as well as RNA-DNA hybrids [26-28]. In prokaryotes, it has been reported that RNases HI of Bacillus halodurans  and of Shewanella sp. SIB1  that have dsRHbd in the N-terminus possess RNase H activity. In contrast, no such domain was identified in RNase HI of the species that had all three types of RNase H (Group A). Interestingly, RNase HI of B. subtilis [REFSEQ: NP_390082], a member of Group A, exhibited neither RNase H activity nor other nuclease activity, even though RNase HII and HIII possess RNase H activity [22,31]. This may indicate a difference of RNase H activity between RNase HI without dsRHbd (Group A) and RNase HI with dsRHbd (Group B').
To identify differences in the primary and secondary structures between RNase HI without dsRHbd (Group A) and RNase HI with dsRHbd (Group B'), multiple alignments were performed using the amino acid sequences of each RNase HI domain in the bacillales and lactobacillales and the E. coli RNase HI domain. If the species had multiple RNase HI genes, one gene that was more similar to E. coli RNase HI than to any other gene was selected; these are described in Additional file 2. As a result, the RNase HI sequences were divided into three groups (Figure 3). The amino acid sequences of RNase HI in Group A were similar to that of B. subtilis RNase HI, which exhibited no nuclease activity. In contrast, the primary structures of RNase HI with dsRHbd formed two groups: Group B'1, in which the primary structures of lactobacillales RNase HI were similar to that of E. coli RNase HI, whose nuclease activity has been demonstrated , and Group B'2, in which B. halodurans and B. clausii RNase HI had little similarity to other RNase HI but showed RNase H activity . There is also a marked difference in the secondary structure. RNase HI in Group A lacked the basic protrusion handle (alpha-helix 3) involved in substrate binding of E. coli RNase HI [33,34]. On the other hand, all of the lactobacillales RNase HI with dsRHbd in Group B'1 had the basic protrusion handle. Although the basic protrusion handle is not observed in B. halodurans and B. clausii (Group B'2), it has been proposed that dsRHbd could functionally compensate for this basic protrusion . From the relationship between structural similarity and RNase H activity, it can be inferred that RNase HI with dsRHbd in Group B' exhibits RNase H activity but it is unclear whether RNase HI in Group A exhibits RNase H activity or not, because the archaeal RNase HI of Halobacterium sp. NRC-1  and Sulfolobus tokodaii 7  exhibited weak RNase H activity despite the absence of the basic protrusion handle. However, the fact that a double knockout of RNase HII and HIII genes in B. subtilis yields a lethal phenotype  indicates that Group A RNase HI genes encoded in the B. subtilis genome do not have the ability to compensate for functions of RNase HII and HIII. Therefore, our results (that RNase HIII is not present in Group B but is present in Group A) suggest that there is some sort of relationship between protein functions and gene constitutions.
Phylogenetic distribution of dsRHbd sequences
Our results (Table 2) clearly showed that the combination of RNase HI and HIII genes (Group D) was not found in the prokaryotes and that most bacterial species had combinations of RNase HI and HII (Group B) or RNase HII and HIII (Group C). Moreover, the combination of RNase H genes has been altered even among closely related species in such a way that functional RNase HI and HIII genes do not coexist in a single genome; in other words, our results provide evidence that RNase HI and HIII tend to evolve in a mutually exclusive manner. Avoiding the simultaneous inheritance of the RNase HI and HIII genes is remarkable when RNase HI contains dsRHbd in the firmicutes, because dsRHbd sequences were found in 15 out of 18 species that combined the RNase HI and HII genes (Group B) and were not found in any of the 15 species that had all three RNase H genes (group A) (see Figure 2). Therefore, dsRHbd appears to be a key domain in the evolutionary process that has led to the current distribution of RNase H genes. Although the characteristics of dsRHbd, such as its enzymatic features [25,27] and its secondary structure, have been compared with those of eukaryotic RNase HI , little is known about the number and types of dsRHbd in prokaryotes. Therefore, we searched for dsRHbd sequences in the complete genomes of 326 strains from 235 bacterial species and 27 strains from 27 archaeal species in the same way that we searched for the RNase H sequences (See Methods).
The results revealed that the genomes of 30 bacterial species (one of which had two strains) and 1 archaeal species encoded dsRHbd (Table 3), and that the distribution pattern of dsRHbd in prokaryotes did not appear to be correlated with the phylogenetic pattern. Most dsRHbds are fused with the RNase HI domain, but Lactobacillus delbrueckii has two genes encoding dsRHbd; one is associated with the RNase HI domain and the other is associated with the resolvase domain. In addition, it is interesting that the dsRHbds of Gloeobacter violaceus, Bdellovibrio bacteriovorus, and Myxococcus xanthus were identified in the C-terminus of RNase HI even though many dsRHbds were in the N-terminus, as in the eukaryotes. Multiple alignments of the amino acid sequences of prokaryotic dsRHbds showed that the sequences of dsRHbd located in the C-terminus were similar (Figure 4). The process of dsRHbd acquisition can be inferred from the fact that almost half of the RNase HI with dsRHbd was found in firmicutes that have the abilities to acquire new genes through lateral gene transfer . In addition, RNase HIII genes were not found in any genomes of the 31 species that encoded RNase HI with dsRHbd (Additional file 3), supporting the hypothesis of mutually exclusive evolution of RNase HI and HIII.
Redundant RNase HI genes in a single genome
We also found that 10 of the 31 species listed in Table 3 had multiple RNase HI genes (see Additional file 3). If RNase HI with a dsRHbd gene influences the existence of the RNase HIII gene in a genome, how is the effect exerted on other RNase HI genes? To address this question, we examined the amino acid sequences of RNase HI without dsRHbd in these 10 species. The RNase HI without dsRHbd that were found in five species in the firmicutes and one species in the deltaproteobacteria, with the exception of B. bacteriovorus, were similar in structure (e.g., lacked the basic protrusion) to that of the Group A RNase HI (see Figure 3). On the other hand, the primary structures of RNase HI without dsRHbd in three species of gammaproteobacteria resembled that of E. coli, and there were few differences in their amino acid sequences. Because the primary structures of RNase HI with dsRHbd in the same species in the gammaproteobacteria were also similar to that of E. coli, it is difficult to distinguish redundant RNase HI genes on the basis of their amino acid similarities.
To identify the differences among redundant RNase HI sequences of the gammaproteobacteria (see Additional file 4), we constructed a Bayesian tree based on the nucleotide sequences of the RNase HI domains from 12 species in the gammaproteobacteria (Figure 5). This analysis divided the RNase HI domains into four gene clusters: orthologous RNase HI, including E. coli RNase HI (Group I); RNase HI with dsRHbd (Group II); and other two groups of additional RNase HI (Groups III and IV). Because RNase HI genes in Group I appear to have been inherited by vertical descent from a common ancestor, we defined them as orthologous RNase HI genes. On the other hand, RNase HI genes of Group II to IV seem to have been provided by gene duplication or lateral gene transfer in addition to the original RNase HI genes. Interestingly, orthologous RNase HI was not found in Saccharophagus degradans that contains RNase HI with dsRHbd (Group II). In contrast, Pseudoalteromonas atlantica contains orthologous RNase HI (Group I) instead of RNase HI with dsRHbd, though the presence of Group III RNase HI is common to S. degradans and P. atlantica. In addition, orthologous RNase HI was not found in the genome of Colwellia psychrerythraea, which contains only RNase HI with a dsRHbd gene (Group II). The same statement applies to 21 other prokaryotic species that have only RNase HI with dsRHbd (see Additional file 3). On the other hand, we also found that orthologous RNase HI (Group I) and RNase HI with dsRHbd (Group II) had both been retained in two genomes of Photobacterium profundum and Shewanella denitrificans. These results suggest that RNase HI with dsRHbd may be capable of replacing the original RNase HI. A lineage-specific characterization such as the mapping of gene trees onto species trees using a soft parsimony algorithm  is necessary for more precise analysis of the transition of RNase HI genes during the course of evolution.