The E. histolytica strain HM1:IMSS, which is available from the American Type Culture Collection [ATCC (http://www.lgcpromochem.com/atcc/)] and is used in laboratories worldwide, served as the DNA source for the genome project. Ultimately, 12.5-fold coverage of the nearly 24-Mb genome was achieved, with identification of 9938 predicted genes, each averaging 1.7 kb in size, comprising ~49% of the genome . Originally, introns were thought to be rare in E. histolytica genes, but they are contained within 25% of the putative genes of this parasite. A theme of the E. histolytica genome is repetition and redundancy. Almost 10% of the sequence reads consisted of tandem arrays of one of 25 different types of repeating tRNA unit, containing between one and five tRNA types per unit. Because all but four of the tRNAs required for translation were found exclusively within these arrays, they must be functional . Redundancy also characterizes many of the genes encoding suspected or proven E. histolytica virulence factors. New cysteine proteinase genes were identified (there are now at least 20), some of which contain putative transmembrane domains – which are lacking in the cathepsins of higher eukaryotes [2,3]. Three new amebapore genes were identified (bringing the total to six) and 30 homologs of a gene encoding the intermediate subunit of an E. histolytica surface lectin were identified. These findings raise the obvious questions of how many copies of these redundant genes are expressed within E. histolytica, whether any exhibit stage-specific (trophozoite versus cyst), or host- or tissue-specific (e.g. expressed in trophozoites residing in the human colon but not in cultured ameba) expression and whether there are functional differences between members of these large gene families.
The issue of gene expression has been partially explored for the cysteine proteinase genes, in which it was observed that trophozoites in culture seem to express only eight of the 20 identified cysteine proteinase genes, with only six of the eight being expressed at significant levels . It remains to be determined whether similar ratios exist for the other large gene families identified by the genome project. The repetitive nature of the genome and its high AT content, which made the cloning of large insert libraries difficult, led to the most important limitation of the study: the failure to generate a map of the genome. It is anticipated that this will be achieved in the future and, when completed, it will undoubtedly provide important insights into the organization of the multigene families, and clues as to how their expression is regulated.