table of contents table of contents

Complete genome sequences of important bacterial pathogens and industrial organisms hold significant …

Home » Biology Articles » Genetics » Genomics » A Flood of Microbial Genomes–Do We Need More? » Why sequence multiple species and strains?

Why sequence multiple species and strains?
- A Flood of Microbial Genomes–Do We Need More?

A wide variety of microbial sequencing projects having been completed or being implemented throughout the world has created a rich and diverse ‘mega-database’ of microbial genomes. However, to fully gauge the prevailing diversity and stratification patterns of all bacterial species, it will be required to sequence hundreds and thousands of genomes representing all branches and lineages within the bacterial and archaeal part of the tree of life wherein each of the phylum provides an opportunity to capture evolutionary footprints of billions of years. It is estimated that there are at least 35 different phyla of bacteria according to the rRNA gene sequence based tree of life [12]. 1The genome sequences of bacteria that have accumulated so far represent only three phyla, thus leaving major gaps in the genomic representation of the bacterial diversity of our biosphere. It is therefore urgently required to sequence genomes from underrepresented phyla and to improve resolution of deep branches in the bacterial tree so as to enable biological studies of important lineages and to decipher novel functions thereof. In view of these facts more systematic approaches to the sequencing of the microbial genomes are needed to leverage data for the interpretations of environmental surveys as well as to facilitate comparative genomic analyses and annotations of different genomes and microbiomes. The GEBA (Genomic Encyclopedia of Bacteria and Archaea) project is one such ‘community phylogenomics’ initiative that is being implemented at the Joint Genome Institute ( This program aims at filling the genomic gaps pertaining to bacterial and archaeal branches of the tree of life while using the tree itself as a guide to identify which target microorganisms need to be sequenced completely. Some of the potential benefits of the GEBA project include identification of new protein families across different lineages of bacterial phyla so as to provide a comparative genomics and proteomics platform towards annotation of forthcoming genomes and microbiomes of the same or different phyla. Also, it will facilitate improved phylogenetic anchoring of metagenomic data-sets besides providing better understanding of the processes underlying the evolutionary diversity and functional stratification of different microbes inhabiting various different niches in the environment.

Many of the pathogenic bacterial species are monomorphic meaning that they present very little diversity upon genetic fingerprinting or limited sequence profiling. Gaining insights into their dispersal patterns, evolutionary genetics, emergence and reemergence in different communities and catchments poses a great challenge for molecular epidemiologists. Multiple genome sequences from across strains of a single species offer more fine scale resolution of genetic differences that enable tracking and identification of species and development of additional genetic markers.

Prokaryotes evolve largely by horizontal gene acquisition, vertical genome reduction and in-situ gene duplication strategies to shape an optimal repertoire of the genes and elements to support a successful lifestyle [7]. Lateral gene flow is widespread among different strains of a single species and most bacterial organisms acquire novel functions through harnessing functional attributes of some of the genes gained through such recombinational processes. One important message that has emerged from the analyses of complete genomes is–microbes are diverse and highly adaptable. To know why it is so, we need further insights through individual and community level genomics. Such federated genomics approaches are also likely to help us answer several outstanding questions such as, how virulence evolves as a function of genome optimization under different compulsions offered by a colonized niche; how microbes regulate their genomic streamlining; what environmental stimuli are responsible for the diversification and stratification of microbial lineages; what is the functional significance of prokaryotic genomic diversity especially in the context of host and tissue tropism and towards understanding parasitism versus commensalism; and how can microbial genome data and the observed diversity be experimentally harnessed for the generation and selection of optimally adapted microorganisms? These questions clearly underpin case for sequencing additional representatives from different pathogenic microbial species.

Novel genes constantly emerge from newly sequenced replicate genomes [13], [14] and thus the concept of a ‘dockyard’ of genes (of presumably unknown functions) that each of the strains harbors. This paradigm was supported by the analyses wherein the pan-genome of a true bacterial species is described to be ‘open’ and each new genome sequence would identify dozens of new genes in the existing pan-genome of Steptococcus agalactiae for example [14]. It is clear also from previous studies that such pool of strain specific genes in pathogens such as Helicobacter pylori, termed the ‘plasticity region cluster’, could be useful in adaptation to a particular host population [15]. This pathogen shows a very strong geographic adaptation and is known for harboring up to 45% strain specific genes with most of them gained through horizontal gene transfers [7], [15]. Recently the members of the plasticity region cluster were shown to be likely involved in promoting proinflammatory potentials of some of the strains thus providing a survival advantage [16], [17].

Another important reason to sequence replicate genomes of a prokaryotic species entails need to study chronological evolution of bacterial pathogens within their hosts. The nature and extent of genetic polymorphisms accumulated in the genome of bacterial pathogens across wide timescales and during the colonization of different host niches are not known. The advantages of polymorphisms linking to fitness in pathogens or commensals need additional in-depth studies. While some studies have explored chronological strain diversity through genetic fingerprinting [18], microarrays [19] and limited sequencing [20], whole genome profiling of isolates obtained at different time points and sampled from different sites is required to investigate the frequency and timing of the emergence of small insertions, deletions and substitutions and their functional significance in terms of adaptive mechanisms.

With complete genomes of multiple variants of a closely related group (genus or species), it is possible to test evolutionary hypotheses based on the core genes of the group. The phylogenetic relatedness of such core genes could then be harnessed to examine larger collection of strains by multilocus sequence typing (MLST). This genome sequence based approach has already revolutionized molecular epidemiology and evolutionary genetics of many bacterial pathogens as previously reviewed [21]. The most noteworthy case is of Leptospira interrogans whose genome sequences enabled significant insights into the question as to how virulence evolves during the traverse of pathogens from one intermediate host to the other. This has been facilitated through comparative genomics with saprophytic L. biflexa genome sequence [22] as well as genome guided insights into phylogeny of various species of the pathogen [23] and through differences between saprophytic and pathogenic species [22]. Based on the core genome of pathogenic and saprophytic strains, a sensitive and accurate MLST [24] method was developed to track and analyze individual strains of different species at population levels; a task which was otherwise impossible by using traditional serotyping approaches. This is because the serotype is often influenced by frequent lateral gene transfer events within the loci that determine repertoire of cell surface antigens.

Leaving aside genetic diversity of naturally occurring populations, important differences in the isolates of even a single laboratory strain might be highly significant in genetic experiments. Using whole genome sequence determination, several important polymorphisms were detected in replicate genomes of a single strain of Bacillus subtilis [25]. Such approaches allow rapid identification and mapping of single nucleotide polymorphisms and mutations linked to different phenotypes because they are less laborious and definitely cheaper than genetic mapping experiments.

rating: 5.00 from 1 votes | updated on: 30 Jun 2009 | views: 5994 |

Rate article: