Selective Isolation and Characterization of Novel Microorganisms
Analysis of DNA extracted from environmental samples has shown that molecular genetic diversity is much greater in natural habitats than was previously recognized (117
). Such studies show that there are many microbial taxa to be discovered and isolated in pure culture. Despite the inherent problems faced in selectively isolating and characterizing microbes from environmental samples, steady progress continues to be made, as exemplified by advances made in unravelling the systematics of extremophiles (169
), lactic acid bacteria (21
), legume nodule nitrogen-fixing bacteria (87
), rhodococci (172
), sphingomonads (116
), microbial pathogens of insects (225
), and protozoa (85
). Nevertheless, substantial difficulties remain in sampling and characterizing representative members of the microbial populations found in natural habitats.
The spatial distribution of microorganisms in soil (200) and the need to overcome a range of microbe-soil interactions (426) are serious limitations to quantitatively and representatively sampling soil microorganisms (352). Procedures used to promote the dissociation of microorganisms from particulate matter include the use of buffered diluents (348), chelating agents (300), elutriation (219), mild ultrasonication (379), and repeated homogenization of soil in several buffers followed by separation of extract from residue (122); these procedures address the problems outlined above to varying degrees. Several of these physicochemical procedures were incorporated into a multistage dispersion and differential centrifugation procedure (220) that was shown to be effective for representative sampling of bacteria, including actinomycetes, from soils with different textures (220, 300).
The dispersion and differential centrifugation (DDC) method has been shown to be 3 to 12 times more effective in extracting actinomycete propagules from a range of soils than the standard procedure of shaking soil in diluent (17). There was also evidence that representatives of different streptomycete taxa were isolated at different stages of the extraction procedure and that certain organisms were only found on isolation media seeded with inocula obtained by using the DDC procedure. These observations suggest that persistent associations between soil particles and actinomycete propagules may be one of the major limitations to quantitative and representative sampling of actinomycete communities in soils and that the DDC method can be used to effectively break down such interactions.
The technique of extinction (or dilution) culture also warrants greater attention from microbiologists wishing to isolate microorganisms from, in particular, oligotrophic habitats. The theory and practical procedures of extinction culture were developed by Don Button and his colleagues (65) in attempts to recover numerically abundant but difficult to culture marine picobacteria. Cultures are produced by diluting the original environmental sample to near extinction of the ability to grow; sterilized seawater provided both the diluent and the culture medium in Button's experiments, but organic amendments may be added, or other appropriately dilute media may be used. The technique has two important advantages: it provides a means of studying organisms that may be abundant in a particular habitat but, because of their oligotrophic nature, are outcompeted by kinetically more versatile organisms in conventional enrichment methods, and dilution to extinction offers the prospect of isolating pure cultures of organisms. In the latter regard, extinction isolation culture is a valuable method for obtaining pure cultures of marine bacteria that frequently grow poorly on solid media and of oligotrophic microorganisms. For recovering marine oligobacteria, Button et al. (65) recommended the use of unamended sterilized seawater and monitoring the developing populations at least three times a week over a 9-week period. Growth should be evaluated with sensitive techniques such as epifluorescence microscopy and flow cytometry. Examples of the successful use of extinction culture are few, but the work of Schut et al. (408) on the marine ultramicrobacterium Sphingomonas sp. strain RB2256 and Button et al. (64) on Cycloclasticus oligotrophus (see later section) are model investigations of this type.
Another constraint on quantitative and representative sampling of microorganisms from natural habitats is the lack of suitable selective isolation procedures. The selectivity of isolation media is influenced by nutrient composition, pH, and the presence of selective inhibitors, as well as by other incubation conditions. Innumerable medium formulations have been recommended for the selective isolation of microorganisms, but the ingredients have been chosen empirically, and hence the basis of selectivity is not clear (281, 489). It is now possible, using computer-assisted procedures, to objectively formulate and evaluate selective isolation media (60). Indeed, numerical taxonomic databases, which contain extensive information on the nutritional, physiological, and inhibitory sensitivity profiles of the constituent taxa, are ideal resources for the formulation of new selective media designed to isolate rare and novel organisms of biotechnological importance.
The streptomycete database generated by Williams et al. (488) has been used to formulate isolation media designed either to favor the growth of members of uncommon Streptomyces species known to be promising sources of new bioactive compounds or to inhibit the growth of the ubiquitous Streptomyces albidoflavus, which tends to predominate on standard media used for the selective isolation of streptomycetes (460, 490). It was apparent from these studies that a medium based on raffinose and histidine as the major carbon and nitrogen source, respectively, led to the predictable reduction in the numbers of S. albidoflavus strains on isolation plates, thereby facilitating the growth of rare and novel streptomycetes. In a continuation of these studies, large numbers of two putatively novel streptomycete species were isolated from hay meadow plots at Cockle Park Experimental Farm, Northumberland, United Kingdom (17).
Another way of optimizing the search and discovery of new bioactive compounds is to ensure that organisms growing on selective isolation plates represent novel or previously uninvestigated centers of taxonomic variation (177). The choice of organisms for pharmacological screening programs, especially those with a low throughput, is primarily a problem of distinguishing among known organisms and recognizing new ones. It is now relatively easy to detect rare and novel microorganisms due to the increasing availability of sound classifications based on the integrated use of genotypic and phenotypic data (85, 176, 239, 454). This approach, which is known as polyphasic taxonomy, was introduced by Colwell (82) to signify successive or simultaneous studies on groups of organisms using a combination of taxonomic methods designed to yield good-quality genotypic and phenotypic data. A range of powerful methods are available for the acquisition of taxonomic data (Table 2).
The polyphasic approach to the detection of rare and novel taxa of biotechnological importance only became practicable with the availability of rapid data acquisition procedures, improved data handling systems, and associated microbiological databases (66, 67). The application of polyphasic taxonomy has led to profound changes in bacterial systematics, especially with respect to industrially significant groups, such as the actinomycetes, for which traditional taxonomies based on form and function made it impossible to select a balanced set of strains for industrial screens (172a, 175). The reclassification of several actinomycete taxa, notably the genera Microtetraspora (508), Mycobacterium (472), Nocardia (175), Rhodococcus (172a), and Streptomyces (262), and the delineation of new actinomycete genera, such as Beutenbergia (191), Ornithinicoccus (190), Tessaracoccus (312), and Williamsia (246), are all products of the polyphasic approach. Similarly, a host of new actinomycete species, for instance, Amycolatopsis thermoflava (74), Gordonia desulphuricans (264), Nocardioides nitrophenolicus (506), and Streptomyces thermocoprophilus (262), have been described using a combination of genotypic and phenotypic data. Corresponding integrated approaches are increasingly being used to circumscribe protozoal (139) and fungal (53, 239, 326), taxa, notably yeasts (393, 435).
Polyphasic taxonomy is now well established, though little attempt has been made to recommend which methods are the most appropriate for generating consensus classifications. At present, polyphasic taxonomic studies tend to reflect the interests of the individual research groups and the equipment and procedures they have at their disposal. It is not possible to be too prescriptive about the methods which should be used, as those selected need to reflect the taxonomic ranks under consideration (Table 2). However, it is clear that small-subunit rRNA is a powerful tool for highlighting new centers of taxonomic variation (56, 85, 195, 498), though the technique does not always allow the separation of members of closely related species. In contrast, DNA-DNA relatedness, molecular fingerprinting, and phenotypic studies provide valuable data for the detection of groups at and below the species level (418, 473).
The polyphasic approach to circumscribing microbial taxa can be expected to meet several of the primary challenges facing microbial systematists, notably the need to generate well-defined taxa, a stable nomenclature, and improved identification procedures. However, most of the methods used in such studies are demanding in terms of time, labor, and materials and hence fail to meet the requirements for the rapid and unambiguous characterization of large numbers of isolates. These requirements are crucial steps in screening for natural products or biocatalytic activities of industrial interest. In this context, the ability to exclude previously screened organisms and to recognize microbial colonies on primary isolation plates that have developed from identical environmental propagules (dereplication) (60) greatly assist the selection of biological material for large commercial screening operations.
It is also important for screening programs to discriminate between microorganisms at the infraspecies level, that is, to examine the genetic diversity within a defined species, as it is well known that the capacity to produce primary and secondary metabolites is frequently a property expressed by members of infraspecific taxa rather than species per se (60). Some widely used molecular techniques, such as small-subunit rRNA gene sequencing, lack the power to distinguish between strains below the species level or between members of recently diverged species (79, 141), while others that have this resolving power (amplified and restriction fragment length polymorphisms and single-strand conformation polymorphism) are laborious and time-consuming.
Given the objectives and constraints outlined above, the ideal procedure for microbial characterization should be universally applicable, require small, easily prepared samples, provide rapid and highly reproducible data, be capable of automation, and handle high throughputs. All of these requirements are provided by physicochemical whole-organism fingerprinting methods (173, 303), the most widely employed being Curie point pyrolysis mass spectrometry (PyMS). Other methods of this type are Fourier-transform infrared spectroscopy (FT-IR) and dispersive Raman spectroscopy; the three procedures have been compared recently for the phenotypic discrimination of urinary tract pathogens (172).
Curie point PyMS has been shown to be of value in rapidly grouping microorganisms isolated from environmental samples (92), for defining pyrogroups (clusters) of commercially significant actinomycetes (132, 399), and for recognizing subtle phenotypic differences between strains of the same species (171). Good congruence has been found between numerical phenetic, molecular fingerprinting, and PyMS data, as exemplified by a polyphasic study on clinically significant actinomadurae (446). Similarly, it has been shown that the taxonomic integrity of three putatively novel species of Streptomyces highlighted in a polyphasic study was supported by PyMS data (17). These observations make it possible to develop an objective strategy to determine the species richness of cultivable streptomycetes isolated from natural habitats. Thus, putatively novel streptomycetes can be grouped together on the basis of their easily determined pigmentation characteristics, and the taxonomic status of the resultant color groups can then be determined by characterizing selected strains by PyMS and comparing the pyrogroups with the original color groups. If required, more exacting taxonomic studies can be carried out on representative strains using more sophisticated procedures, notably small-subunit rRNA sequencing.
A strategy similar to the one outlined above was used to circumscribe novel, industrially significant rhodococci selectively isolated from deep-sea sediments in the northern Pacific Ocean close to Japan (79, 80). Subsequently, excellent congruence was found in double-blind numerical phenetic and PyMS analyses of representative rhodococcal isolates, indicating that the delineated pyrogroups were directly ascribable to the observed phenotypic variation and, in consequence, of real value in screening programs (81). The results of this study affirmed the value of PyMS in characterizing microorganisms, discriminating organisms at the infraspecies level, and enabling rapid and effective dereplication of strains prior to screening. This approach can be applied directly to target strains growing on isolation plates, thereby obviating the requirement for time-consuming laboratory testing to distinguish duplicate colonies and permitting the rational collection of colonies from such plates for subsequent screening. These attributes, coupled with the speed of analysis (approximately 2 min per sample), the very small sample size required (50 to 100 μg), the high reproducibility, and the high automated throughput, commend PyMS as a method of choice for industrial screening programs based on microorganisms.
Detection of Uncultured Prokaryotes: Molecular Approaches
Traditionally, members of established and novel microbial taxa isolated from natural habitats were recognized using phenetic methods which drew upon available genotypic and phenotypic data. An alternative approach to the estimation of prokaryotic diversity in natural habitats was initiated by the application of molecular methods (355
), most of which allowed the recognition of uncultured organisms based on the use of 16S rRNA sequences. It was apparent even from the initial studies that spectacular patterns of prokaryotic diversity had gone undetected using standard cultural and characterization procedures. The molecular approaches also confirmed observations from direct microscopy that the number of prokaryotes which can be readily cultivated from environmental samples is only a small and skewed fraction of the diversity present (471
). The inability to cultivate even the most numerous microorganisms from natural habitats has been referred to as the “great plate count anomaly” (423
Several procedures have been used to estimate prokaryotic diversity based on the examination of DNA extracted from environmental samples (118, 205, 352). Environmental DNA samples have been analyzed using reassociation kinetics to estimate community complexity and the number of constituent genomes (444, 445), but the procedure lacks the precision to identify individual genomes or to place them within a hierarchical taxonomic framework. In contrast, analyses of 16S rRNA sequences can be applied to specific uncultured prokaryotes and the position of the resultant phylotypes can be interpreted in terms of inferred common ancestry.
In the bulk DNA cloning approach (360, 406), total DNA extracted from environmental samples is partially digested using a restriction enzyme and cloned with a lambda vector. Genomic libraries generated in this way supposedly do not impose any selective bias on the recovery of rRNA genes from members of different taxa. The major practical disadvantage of this approach is that most clones in the DNA library will not contain rRNA genes; the predicted value is 0.5% (406).
A quicker and more effective way of unravelling the composition of prokaryotic communities is based upon PCR-mediated amplification of 16S rRNA genes or gene fragments (using either rRNA or rDNA isolated from environmental samples) with 16S rRNA gene-specific primers followed by segregation of individual gene copies by cloning into Escherichia coli (165). This procedure generates a library of community 16S rRNA genes, the composition of which can be estimated by sampling clones and comparing their sequences by restriction endonuclease digestion, their reaction to specific probes, or by full or partial sequencing (468a). The resultant information can be analyzed to infer abundance and representation in the library. Unique clones can be completely sequenced and their relationship to corresponding sequences from cultured taxa in a taxonomic hierarchy based on 16S rRNA can be determined. As with other molecular approaches, the success of this procedure depends on the quality of the extracted DNA and whether it is representative of natural prokaryotic diversity in the environmental sample.
A number of potential sources of bias exist in DNA-based analyses of natural microbial communities. These have been extensively reviewed elsewhere (184, 205, 352, 464, 468a) and include preferential amplification of specific templates due to PCR primer choice (432), differential cell lysis (147, 327), the GC content of DNA sequences (387), the formation of chimeric PCR products (293, 467), genome size and rRNA gene copy number (123), and the presence of free DNA or DNA in spores (447). It is because of factors such as these that studies based on PCR amplification of small-subunit rDNA genes should be compared with the results derived from the application of contemporary selective isolation and characterization methods. However, it is very encouraging that in comparable analyses of soil-derived 16S rRNA sequences (42, 279, 289, 293, 419) the same groups of prokaryotes were detected despite the use of different DNA extraction, cloning, and PCR techniques.
The analysis of uncultured prokaryotic communities in natural habitats based on 16S rRNA sequences has been extensively reviewed (8, 118, 119, 205, 358, 468a). A number of general conclusions can be drawn from surveys of uncultured prokaryotic communities in marine sediments (107, 149, 184, 254, 382, 392, 452, 459), seawater (34, 166, 381, 495), Yellowstone hot springs (28, 29, 223, 224), rhizosphere (301) and nonrhizosphere (279, 292, 314a, 351, 419) soil, termite guts (367), the rumen (484), and the human gut (430), notably the enormous wealth of microbial diversity, the fact that many of the novel sequences are only distantly related to those known for cultivable species, and the limitations of traditional cultural techniques in retrieving this diversity. It is possible that some of the new phylotypes may be artifacts of the PCR procedure, but most appear to be genuine; for example, Barnes et al. (29) reported that 4 of 98 clones were chimeras, whereas Choi et al. (73) found 7 chimeras out of 81 clones analyzed.
rDNA sequence analyses of uncultured prokaryotic communities are also casting light on the geographical distribution of specific phylotypes. There is evidence that samples taken from the oceans tend to contain sequences of monophyletic groups, for example, archaeal groups I and II and SAR 7 and SAR 11 bacterial clusters (104, 318, 333). Similarly, sequence-based studies from different geographical locations show considerable overlap of sequence types (42, 279, 289, 292, 419). In addition, the perceived ecological boundaries between archaeal habitats (extreme environments such as hot springs and hypersaline waters) and bacterial habitats (temperate soils and waters) are becoming increasingly blurred. Members of the Archaea previously considered to be restricted to high temperatures (division Crenarchaeota) are now known to be abundant in many temperate environments (40, 104, 209), whereas members of the Bacteria appear to play an important role in extreme environments, such as hot springs, commonly considered the province of Archaea (224).
The relative abundance of a sequence in an environmental sample can be estimated by using oligonucleotide probes to analyze total rRNA extracts (104, 165, 382). This approach has some limitations, not least being the fact that different prokaryotes may contain different numbers of ribosomes and hence variable amounts of probe target (468a). A more direct measure of cell abundance can be obtained using fluorescent probes to identify microorganisms in situ (103). This approach can be used to link sequences with morphotypes and to highlight samples that contain cells from which a sequence of particular taxonomic interest originates, thereby providing a tool for use in isolation strategies (222).
Easier and much faster alternatives to the cloning procedures involve the examination of complex microbial populations by either denaturing gradient gel electrophoresis (DGGE) (340) or temperature gradient gel electrophoresis (TGGE) (394) of PCR-amplified genes coding for 16S rRNA. These methods have been used to analyze 16S rRNA genes from environmental samples (129, 134, 340, 341) and allow the separation of PCR-amplified genes on polyacrylamide gels. Separation is based on the decreased electrophoretic mobility of partially melted double-stranded DNA molecules in polyacrylamide gels containing a linear gradient of DNA denaturants (a mixture of urea and formamide) or a linear temperature gradient. Individual bands may be excised, reamplified, and sequenced (134, 339) or challenged with a battery of oligonucleotide probes (340) to give an indication of the composition and diversity of the microbial community.
DGGE and TGGE are relatively easy to perform and allow many samples to be run simultaneously. They are particularly well suited for examining time series and population dynamics. Once the identity of an organism associated with a particular band has been determined, fluctuations of individual components of microbial communities due to seasonal variations or environmental perturbations can be assessed. Heuer et al. (212) used DGGE and TGGE to determine the genetic diversity of actinomycetes in different soils and to monitor shifts in their abundance in the potato rhizosphere. Sequencing of the individual DGGE bands demonstrated the presence of organisms closely related to members of the genera Clostridium, Frankia, and Halomonas. A comprehensive account of the theoretical basis, strengths, and weaknesses of the two methods is given by Muyzer and Smalla (338). The successful application of DGGE has revived interest in genetic fingerprinting of microbial communities. Lee et al. (288) described the use of single-strand conformation polymorphism (357) of PCR-amplified 16S rRNA genes for examining the diversity of natural bacterial communities. Amplified rDNA restriction analysis (ARDRA) has been used to determine the genetic diversity of mixed microbial populations (310, 311) and to monitor community shifts after environmental perturbation, such as copper contamination (413).
Comparison of Molecular and Cultural Techniques
Culture-independent molecular approaches are tending to replace culture-based methods for comparing the composition, diversity, and structure of microbial communities. Investigations based on these approaches have led to the conclusion that traditional methods of culturing natural populations have seriously underestimated archaeal and bacterial diversity. Samples of DNA extracted from seawater, soil, and cyanobacterial mats of hot springs appear to represent predominant populations in these ecosystems, while the species that grow on culture plates are numerically unimportant in intact natural communities. These findings are not surprising, since the vast majority of organisms counted microscopically in samples from these environments have not been grown. One reason for this inadequacy is that cultivation conditions used to isolate organisms do not reflect the natural conditions in the environment examined and thereby select fast-growing prokaryotes that are best adapted to the growth medium (189
). However, greater success in bacterial isolation can be achieved by using culture conditions that more closely approximate natural environments (407
) or by using novel tools, such as optical tweezers, to physically isolate bacterial propagules (222
). There is also molecular evidence that some readily cultivable bacteria are abundant in the environment from which they are isolated (388
). These trends suggest that innovative isolation procedures combined with the identification of phylotypes provide a powerful means of addressing the great plate count anomally.
Relatively few studies have involved a twin-track approach whereby both cultivation and direct recovery of bacterial 16S rRNA gene sequences have been used to gain insight into the microbial diversity of natural bacterial communities (114, 207, 430). Comparative studies such as these are needed not least because both plating and 16S rDNA cloning (147) suffer from biases that can distort community composition, richness, and structure. The molecular approaches provide a new perspective on the diversity of prokaryotes in nature but do not yield the organisms themselves. This means that potentially valuable biotechnological traits can, at best, only be inferred from phylogenetic affinities (8, 102, 207). The need to cultivate representatives of phyletic lines of uncultivable prokaryotes for biotechnological purposes poses a major challenge for microbiologists.
A somewhat mixed picture emerges from comparative studies of natural microbial ecosystems. Chandler et al. (70) found close correlation at the genus level between the cultivable portion of aerobic, heterotrophic bacteria and data derived from the 16S rDNA approach when examining deep subsurface sediment. However, these correlations were detected after aerobic treatment of sediment samples at the in situ temperature but not with the untreated sediment core. It is possible that the treatments caused a selective shift towards enrichment of specific bacterial groups in the samples analyzed compared with the original sediment core. Studies of hot spring microbial mats highlighted several close matches between the 16S rDNA of organisms obtained by culture methods and directly recovered 16S rDNA, but only after several liquid dilutions of the inoculum were used for cultivation instead of direct enrichment based on undiluted inoculum (469, 470). Two major conclusions were drawn from these studies. (i) For the most part, direct enrichment techniques select for populations which are more fit under the chosen enrichment conditions and may not be numerically significant, and (ii) the growth of numerically dominant populations may be favored by using an inoculum diluted to extinction, especially in growth medium which reflects the conditions in the habitat under study. The conclusions drawn by Ward and his colleagues are consistent with the results of a comparative analysis in which bacterial isolates and environmental 16S rDNA clones were recovered from the same sediment sample (433). The corresponding data sets showed little overlap, possibly due to direct plating of the undiluted inoculum onto solidified medium with the subsequent isolation of community members that were not numerically significant. In contrast, a close correlation was found between most-probable-number estimates of isolates and environmental 16S rDNA clones taken from the bacterial community of rice paddy soil (207). In a comparative study of the bacterial community diversity of four arid soils, similar relationships were found between 16S rDNA results and cultivation, though significant differences were also observed (114).
The human intestinal tract microbiota presents a somewhat different situation, as extensive past investigations have characterized this ecosystem in more detail than most other natural communities (134, 215, 324). This means that optimal cultural methods are available for comparative studies of the complex microbial communities that reside in the human gut. Wilson and Blitchington (492) analyzed the composition of the microbiota of human fecal samples and concluded that the bacterial species detected by nonselective culture, when anaerobic bacteriological methods were of high quality, gave a good representation of the bacterial types present relative to that revealed by 16S rDNA sequence analyses. The main discrepancy between the two methods was in the detection of gram-positive groups. In a similar study, 95% of rDNA amplicons generated directly from a single human fecal sample were assigned to three major phylogenetic lineages, namely the Bacteroides, Clostridium coccoides, and Clostridium leptum groups (430). However, an in-depth phylogenetic analysis showed that the great majority of the observed rDNA diversity was attributable to unknown dominant microorganisms within the human gut.
It can be concluded that both innovative cultural procedures and culture-independent methods have a role to play in unravelling the full extent of prokaryotic diversity in natural habitats, especially since there are a number of instances where taxa have only been detected using cultural methods (430, 492). Although the two approaches sometimes provide different assessments of relative community diversity, the discrepancies may be attributed to sampling different subsets of the microbial community and to limitations inherent in each of the two approaches. In addition, highlighting consistent relationships between environments based on the dual approach may be highly habitat dependent due to the limited ability of a single cultural method to survey the full extent of the bacterial communities and the influence of bacterial physiology in situ on the success of cultivation in the laboratory.
Genomics is the activity of sequencing genomes and leads to the derivation of theoretical information from the analysis of such sequences with computational tools. In contrast, functional genomics defines the transcriptome and proteome status of a cell, tissue, or organism under a proscribed set of conditions. The term transcriptome describes the transcription (mRNA) profile, whereas proteome describes the translation (protein) complement derived from a genome, including posttranslational modifications of proteins, and provides information on the distribution of proteins within a cell or organism in time, space, and response to the environment. Together, genomics and functional genomics provide a precise molecular blueprint of a cell or organism, and in this and the following section we examine how they can reveal novel targets for search-and-discovery developments.
Introduction. Improvements in sequencing technology have enabled large-scale whole-genome sequencing (136). The general strategy is to fragment the whole chromosomal DNA into large clones, e.g., bacterial, plasmid, and yeast artificial chromosomes, cosmids, λ phage clones, or long-range PCR products (414), followed by a selection strategy from a large, highly redundant library, usually using a mix of random and directed selection (11, 142). For well-studied bacteria, such as Bacillus subtilis and Streptomyces coelicolor, ordered yeast artificial chromosomes (22), ordered overlapping cosmids (385), and physical and genetic maps may enable directed selection. However, for many whole-genome sequencing projects, high-throughput random shotgun sequencing produces new sequence data most efficiently, at least initially, though the accumulation of new data decreases exponentially with the number of clones sequenced (285). Selection strategies such as seeding or parking (275, 411), followed by walking, gap closing, and finishing (180) are used to fill in the gaps. The choice of initial strategies has consequences for the costs involved in these later stages (391), but the costs of selection strategies themselves are also significant. Nevertheless, sequencing at rates of 23 Mb per month in the human genome project (391) indicate the capacity to overwhelm some of these efficiency considerations by brute-force sequencing and computational power. This latter strategy, advocated by Venter (458), has been used in successively larger projects, Haemophilus influenzae (136), Drosophila melanogaster (397), and proposed and implemented for the human genome (187, 458, 474). In the case of bacteria, 22 complete genomes have been published and 87 are in progress (of which 12 were complete as of 11 May 2000) (TIGR Microbial Database, www.tigr.org/tdb/mdb/mdb.html), thereby demonstrating the rapid deployment of sequencing technology. Using a combination of sequencing technology and strategy, whole-genome sequencing can even be a single-laboratory exercise, as in the sequencing of Lactococcus lactis (41), though at a coverage of only two it would barely be considered draft quality in the human genome project. The numbers of prokaryotic whole-genome sequences can be expected to rise rapidly as funding for additional genome sequencing (e.g., http://www.beowulf.ac.uk/) increases.
Searching for drug targets. Clearly the Human Genome Project (115) will have a major impact on the identification of potential drug targets, and these targets will influence the design of specific screens for therapeutic drugs. Potential therapeutic targets such as Alzheimer's disease, angiogenesis, asthma, stroke, and cystic fibrosis, which are human genome specific, multifactorial, and often involve complex signal cascades, may continue to dominate technology development. Specific and sensitive molecular screens are readily derived using the same molecular biology technologies that are driving the genome programs and using the sequence data from those studies to give high-throughput robotic screening. Initial success in the rational design for targets such as HIV-1 protease (243, 461, 482, 496, 497) leads to strategies for rational design involving gene identification (78, 280), metabolic pathway analysis (252), or determination of protein-protein interactions using affinity methods such as the yeast two-hybrid system, phage display (363), or fluorescent-protein biosensors (167), structure prediction (CASP http://PredictionCenter.llnl.gov/) (161, 242, 305, 503, 507), and modelling (63).
Rational design strategies have not been as rapidly successful as predicted, but other current strategies that involve semirational design and high-throughput screening of massive libraries (26) owe much to rational design strategies. Recently, the move has been away from combinatorial chemical libraries to biological libraries, such as those based on peptides and antibodies, again directed by the role of such molecules in human disease processes. Leads identified by direct selection from initial libraries, by high throughput screening or biopanning, are usually not optimal for the selected properties and hence are subject to further rounds of modification or mutation to generate derivative libraries. Even then the rational selection of, for example, peptides which bind at the highest affinity to thrombopoietin receptors, which are readily selectable, may not guarantee the highest biological activity, which is the required property (91, 296). Also, many human diseases of interest to the pharmaceutical industry involve multiple gene pathways, environmental interactions, and genetic predisposition rather than simply direct causal effects (269). These factors also mediate adverse drug reactions and dictate the effectiveness of drug treatments. These considerations are resulting in extensive comparative genome studies of ethnic populations and human disease states (269) and expectations of personal genetic profiles. “By 2035 we will have the ability to sequence the genome of every individual on the planet…” (classified advertisement for SmithKline Beecham published in Nature in 1999).
Whole-genome sequencing provides data for such rational strategies (108, 152, 403) and has become the chosen approach of many large pharmaceutical companies. The annotation of genes and their functional identification provide a list of all potential targets (78). These targets need to be essential for some vital function in the microbial pathogen, conserved across a clinically relevant range of organisms, and significantly different or absent in humans (5). The combination of whole-genome sequences and tools for bioinformatics allow rapid searches for specific genes with these characteristics. Potential targets can be identified even for functions not previously identified in specific pathogens, on the basis of DNA and protein sequence identification of gene function, and the required essential nature of genes or their products can be established through gene knockouts (294) or gene expression studies in host-pathogen interactions (72, 304). With whole-genome sequencing making possible DNA microarrays of (i) whole-genome ORFmers (complete arrays of DNA oligonucleotides representing all the open reading frames [ORFs] identified in the whole genome) (380, 404, 493) or (ii) specific signature oligomers, and their controls, for whole classes of genes (295, 297), the generation of expression data from such studies (98, 135) is likely to be on a scale to compete with and overtake sequencing. Genomics has contributed to this rational search for drug targets by providing a large set of almost complete catalogues of genes, across a wide range of organisms, which can be compared at many levels. Conservation of genes across a wide range of organisms may prove to be a good indication of an essential function (15), and a minimal set of essential genes for life can be identified (337). Transposon mutagenesis and PCR can be used to directly screen for essential genes (3), and signature-tagged mutagenesis can be used to analyze multiple pools of mutants for loss of function (208). Identification of probable targets in silico allows these experimental molecular techniques to be used to search a smaller set of target genes, making them more directed.
These search strategies can be applied to characterized or uncharacterized genes (14), and the chance of identifying a novel target may well be higher for uncharacterized genes. Uncharacterized gene targets may be identified in databases such as COG (274) and PROSITE (214) as those that are conserved across groups such as microbial pathogens. Such targets still need to be identified as nonessential or absent from humans, and since the human genome sequencing is not yet complete, that involves an extensive search through other, surrogate, eukaryotic genomes (e.g., Saccharomyces cerevisiae and Caenorhabditis elegans) and human-expressed sequence tags. The alternative approach is to characterize the target after its identification as a novel target. Undecaprenyl pyrophosphate synthetase (14), for example, was identified first as an unknown potential drug target and then characterized and identified as part of a specifically bacterial pathway.
Characterized gene targets can be sought using strategies to identify taxon-specific genes employing subtractive techniques, most directly between a specific pathogen and the human genome; however, until the complete human genome is available, this is likely to be a complex and incomplete strategy. However, other criteria can be used to define subsets of genes to search using subtractive techniques. In concordance analysis, the sequences present in one set of genomes and absent from others are determined, for example, bacterial genomes compared to eukaryotic genomes (57). Similarly, in differential genome analysis (229), a different algorithm has been used to compare the genomes of pathogens and their free-living relatives in order to identify the genes present only in the pathogen. In a comparison of Haemophilus influenzae with Escherichia coli (229), 40 potential drug targets were identified. Similarly, in a comparison of Helicobacter pylori with E. coli and H. influenzae, 594 genes were found specifically in H. pylori; only 196 of these were of known function, and 123 of these were responsible for known host-pathogen interactions, leaving 73 potential novel targets (228).
The combination of past knowledge of the biochemistry and physiology of microorganisms and new insights into biological function derived from genome and functional genomic studies can guide more specific search strategies. Metabolic databases such as EcoCyc (252) and KEGG (http://www.genome.ad.jp/kegg/kegg2.html) may enable the identification of pathways specific for microbial pathogens; the genes contributing to these pathways can then be used as potential drug targets (251). As well as these taxon-specific pathways, different phylogenetic lineages may contain nonhomologous enzymes catalyzing common reactions (272, 273). Typically differences are found between prokaryotes and eukaryotes, though specific enzymic variants are found in more specific lineages, e.g., the ure locus in mycobacteria (4) and targets in Chlamydia (245, 424). These nonhomologous enzymes provide attractive potential targets, as they can encode essential functions catalyzed by different mechanisms that can be inhibited without the risk of inhibiting analogous functions in humans. Missing genes from known pathways can be indicative of such targets, while the presence of genes of unknown function in gene clusters can help identify these nonhomologous counterparts. Other strategies can direct specific searches in areas of expected drug targets such as virulence genes (315), membrane transporters (500), or homologues of known drug targets in other organisms (111).
Genome studies both confirm the concept of pathogenecity islands (193) and reveal the rapid divergence of these genes in the evolution of pathogens (369), making them attractive but difficult targets. Similarly, an essential function of pathogens is evasion of the host response defense mechanisms: pathogens such as Haemophilus influenzae, Helicobacter pylori, Escherichia coli, and Plasmodium falciparum (99, 465) all show extreme variation in the targets of the immune system. The presence of simple repeats in prokaryotic DNA sequences has been associated (217, 218) with the concept of contingency genes linked to phase variation of gene expression in pathogens (328). Strategies which combine search algorithms for detecting such repeats with the ability to display genome annotation, and specifically locating them relative to ORFs of known function, can identify targets that are critical to virulence (403).
Plasmodium falciparum is an example of a major human pathogen for which new insights and strategies for drug development are emerging. The full genome sequence of 30 Mb in 14 chromosomes, of P. falciparum (http://www.sanger.ac.uk/Projects/P_falciparum/who&what.shtml) is being completed (48, 155). Searching DNA sequence databases for targets homologous to known drug targets in other organisms has revealed an aspartic protease (93), cyclophilin (38), and calcineurin (111), explaining the antimalarial activity of cyclosporin A. The full genome can be expected to provide many more potential targets (479).
Treponema pallidum, the causative agent of syphilis, is difficult to culture, and little is known of the molecular biology of its virulence mechanisms. Its complete genome has been sequenced (143) and analyzed for virulence factors, revealing several classes of predicted protein-coding sequences that are potential virulence factors (478). Whole-genome studies are resulting in significant progress in understanding these and other infectious agents.
Natural products. Nevertheless, it is unlikely that some of the most successful drugs could have been discovered by any process of rational or semirational design. The mode of action of the immunosuppressants cyclosporin A, FK506, and rapamycin, which bind to cis-trans prolyl isomerase and FKBP12 but then inhibit further steps in critical signal transduction cascades (69, 206), e.g., through calcineurin in the case of cyclosporin A and FK506, would be too complex to design. Not only is the mode of action indirect, but these molecules are complex. The drug targets may have been identified by comparative genomics, since they are conserved from unicellular eukaryotes to humans, but the drugs themselves have required the massive library generation and screening activity of natural selection to evolve. Similarly, two of the most successful antimalarial drugs, quinine and chloroquinine, exert their effect by inhibiting host-encoded functions (389) rather than activities encoded by P. falciparum itself. Chloroquine resistance in P. falciparum resides in a 36-kDa nucleotide sequence which contains genes which are all of unknown function (429), along with 40% of the P. falciparum genome (155).
However, in the search for new classes of antibiotics over the last 20 years, traditional approaches have also failed to deliver new drugs fast enough to keep up with the loss of effectiveness of existing drugs against increasingly resistant pathogens (95% of Staphylococcus aureus are penicillin resistant and 60% are methicillin resistant, and there are cases in China, Japan, Europe, and the United States of vancomycin resistance [http://www.promedmail.org]). The development of resistance may be followed by compensatory mechanisms to adjust for reduced fitness, which may then lock in the resistance mechanism (96). Although there are 150 antibiotics approved in the United States and 27 in clinical development (http://www.phrma.org/), only 1 antibiotic was approved in 1993, none in 1994, and only a few since (51, 428). Thus, random-screening search strategies are being abandoned in favor of rational, target-based approaches.
Molecular biology, robotics, miniaturization, massively parallel preparation and detection systems, and automatic data analysis dominate the search for drug discovery leads. Natural-product extracts and bacterial culture collections are not easy partners in this drug discovery paradigm. The separation, identification, characterization, scale-up, and purification of natural products for large-scale libraries suitable for these high-throughput screens are daunting, and rational arguments for the selection of organisms and/or natural-product molecules are often absent, especially given the poor taxonomic characterization of strains in natural-product bacterial strain collections (A. C. Horan, M. Beyazova, T. Hosted, B. Brodsky, and M. G. Waddington, Abstr. 11th Int. Symp. Biol. Actinomycetes 11:89, 1999).
Many of these screening systems are not sufficiently robust to handle complex mixtures of natural products from ill-defined biological systems (Horan et al., ibid.) and may be inhibited by interactions with uncontrolled physicochemical conditions, simple toxic chemicals, and known bioactive compounds. This has led to significant efforts in rational drug design, combinatorial chemistry, peptide libraries, antibody libraries, and combinatorial biosynthesis (27, 89) and other synthetic and semisynthetic methods to provide clean inputs to screens. However, natural products are still unsurpassed in their ability to provide novelty and complexity. In chemical screening of natural products (216), complex mixtures of metabolites from growth and fermentation are separated, purified, and identified using high-pressure liquid chromatography, diode array UV/visible spectra, and mass spectrometry. Novel chemical structures are passed on for screening, now uncontaminated with background interference from the original complex mixture, and built up into high-quality, characterized natural-product libraries. This strategy suffers from poorly characterized culture collections, which make the choice of organisms to screen difficult, and the inability to control the expression of metabolic potential. These issues are specific examples of the requirement for better systematics, physiology, conservation of microbial diversity, and data integration. For example, typical commercial collections of actinomycetes might consist of 20,000 to 40,000 organisms classified at genus level on the basis of morphology and simple phenotypic characters. This identification may guide the choice of media and conditions for growth but will not aid the selection of strains, predict metabolites, or optimize expression for drug discovery. These issues can be tackled using the same tools and technologies that are driving the search for new drug targets.
Searching for new drugs. The advent of the complete Streptomyces coelicolor genome (http://www.sanger.ac.uk/Projects/S_coelicolor/) provides the opportunity to explore the evolutionary and functional relationships of one of the best studied and industrially and medically significant groups of organisms, the genus Streptomyces. This advance will provide new information to aid search and discovery of novel organisms and new bioactive natural products (R. Brown, H. C. Choke, S. B. Kim, A. C. Ward, and M. Goodfellow, Abstr. 11th Int. Symp. Biol. Actinomycetes 11:149, 1999), identify roles in ecosystems (493), and lead to improvements in bioprocess control (20, 231, 248) for existing products. The extent to which the information from the S. coelicolor genome can be utilized across such a broad spectrum depends upon how representative it is of other streptomycetes.
The streptomycetes form a distinct clade within the radiation encompassed by the high-GC gram-positive bacteria in the 16S rDNA tree. This taxonomic group is identified as a major source of bioactive natural products (60). As a result, major collections of poorly characterized actinomycete strains are held by most large pharmaceutical companies. However, the relationship between metabolic potential and taxonomic or phylogenetic relationships is poorly understood. Within the streptomycete clade there are well-characterized groups at all levels of taxonomic variation from suprageneric (Streptomyces should probably be more than one genus [S. B. Kim, C. N. Seong, and M. Goodfellow, Abstr. 11th Int. Symp. Biol. Actinomycetes 11:55, 1999]) to infraspecific. At the genus and species levels, fundamental questions arise about biodiversity in the prokaryotic world (314, 362, 468, 499). At the molecular level, this diversity is poorly represented. Estimates suggest that less than a tiny fraction of prokaryotes have been isolated, and representatives of only about 10 to 15% of described species are held in service culture collections. Selective isolation of streptomycetes from the rhizosphere of a common tropical tree, Paraserianthes falcataria, revealed extensive diversity around the Streptomyces violaceusniger clade (L. Sembiring, M. Goodfellow, and A. C. Ward, Abstr. 11th Int. Symp. Biol. Actinomycetes 11:69, 1999).
Full 16S rDNA sequences are available in the ribosomal database (http://www.cme.msu.edu/RDP/html/) for less than 100 of the 513 validly described streptomycete species. There is evidence that specific metabolites, such as clavulanic acid, may be synthesized by strains in a specific clade (unpublished data) and that the ability to synthesize, for example, streptomycin and related metabolites appears to be randomly distributed across the whole genus. However, these conclusions are tentative given the poor taxonomy, random screening, and limited chemical characterization of metabolites. Compounding this uncertainty is the complexity of regulatory controls; genetic (71) and genomic methods have the potential to unravel some of this complexity.
Currently genomics has little to say at these levels (species and subspecies strains)—most whole-genome sequencing studies have taken representatives of the major groups (TIGR website cited above) or compared very closely related strains, such as Helicobacter pylori (6). The specific/infraspecific relationships in the streptomycetes and the way they are reflected in the biosynthetic potential to produce novel, bioactive compounds could significantly influence strategies for search and discovery, screening, and bioprocess development. To extend whole-genome studies to more streptomycetes would reveal these relationships in a comprehensive way which would enable validation of current methodologies (from 16S rDNA phylogenies to DNA-DNA pairing) and lead to new understanding of speciation, phylogenetic relationships, and genome function in secondary metabolism. However, whole-genome sequencing of more streptomycetes is an open question that would involve difficult choices; any small number of strains would only begin to address the questions above. However, it must be possible to begin to address these problems using the S. coelicolor genome as a template for whole-genome comparisons across the streptomycete clade.
The functional analysis of the S. coelicolor genome would gain significant benefit from a greater understanding of the ecological niche and role of Streptomyces violaceoruber-like streptomycetes [S. coelicolor A3(2) is formally a synonym of S. violaceoruber]. There is considerable current interest in the role of actinomycetes and streptomycetes in particular in natural ecosystems, especially grassland. Their role in carbon turnover and their response to land management practices could be important in maintaining soil fertility and productivity during a shift to sustainable land management. However, the ecological role of S. coelicolor (S. violaceoruber) is poorly understood, and in identifying the function of unknown genes, knowledge of its ecological role would enable answers (B. D. Kell, personal communication, 1999) to be attempted. Clearly the whole genome has a significant role in identifying the metabolic potential for activity and interactions in the soil ecosystem (493). The identification of strains related to S. coelicolor A3(2) and their detection using molecular ecological methods and selective isolation would complement functional analysis.
Although the discrepancy between organisms isolated and those identified by molecular methods is often striking, careful studies identify biases in both approaches and, with appropriate techniques, the ability to culture many organisms from specific habitats (170, 470). The importance of cultivation conditions has been emphasized, and the use of techniques such as extinction culture for abundant oligotrophic fractions of the microbial community point the way forward (65) without the need for the concept of uncultivatable bacteria. Nevertheless the description of a specific bacterial cytokine required for resuscitation of Micrococcus luteus (329) illustrates the case in which neither medium development nor extinction dilution would be expected to resuscitate dormant M. luteus (nevertheless, M. luteus is not difficult to isolate). The discovery of M. luteus resuscitation protein factor (rpf) was the result of careful microbiology (463). Its rapid identification across the whole of the actinomycete clade, including mycobacteria and streptomycetes, with implications for clinical and ecological isolation (250), was the result of genomic studies, and the identification of multiple rpf genes in Mycobacterium tuberculosis and Streptomyces coelicolor was the direct result of the availability of whole-genome sequences. The whole area of stress response, signaling, and global regulatory mechanisms is now being dissected in organisms like S. coelicolor (309, 344, 356, 486) and has important implications for growth and antibiotic expression, affecting isolation and screening strategies for natural products.
The ecology of streptomycetes is of considerable interest for search and discovery of natural products. Currently novel products are sought from organisms isolated from extreme or novel environments (79, 106). However, the extent of variation within the compass of the known variation of streptomycetes is diverse and complex (Sembiring et al., abstr.), understanding it is tied up with problems of isolation and cultivation of the full diversity, speciation, and expression of the full metabolic potential. Understanding the extent of genes of known function in streptomycete genomes, identifying the role of genes of unknown function, and understanding regulatory and stress response networks will enable rational design of isolation methods and screening strategies.
Bioprocess control. Controlling gene expression is essential in exploiting new drugs, in research and development, and in production in fermentation processes. In bioprocess control, whole ORFmer arrays of Streptomyces coelicolor could be used to monitor gene expression and physiological responses of streptomycetes like S. fradiae and S. clavuligerus in large-scale fermentations. Sequence similarities across the streptomycete clade may mean that virtual-expression arrays (156) may be used to monitor gene expression for research and development and for optimization and control of antibiotic production. Many switches in metabolism are reflected by networks of signaling and response genes so that incomplete and qualitative coverage of the gene response of these industrially significant organisms would still enable their identification and interpretation from knowledge of the S. coelicolor genome. These data will enable software sensing (323) of important physiological shifts in bioprocess operation, identified by transcriptome analysis of representative fermentations, by estimating them from secondary measurements using current on-line and off-line process measurements. However current measurements (substrate feeds, physicochemical measurements like pH, dissolved O2, substrate concentrations, carbon dioxide evolution rate, and oxygen uptake rate) would need to be supplemented with multivariate measurements which are sensitive to biological state, such as FT-IR (437), dielectric spectroscopy (501), and PyMS (317). The problem with many of the multivariate methods is that, for the complex samples from fermentation processes, they are black box techniques; by combining them with transcriptome analysis, patterns detected by these techniques could be interpreted using the power of genomics. One application of comparative genome analysis would be to identify specific DNA sequences which could be assembled into either specific (for individual strains) or generic arrays to monitor gene expression in streptomycete bioprocesses.
Whole-genome sequencing and rapid biotechnological developments in the field of molecular biology mean that the gene is seen as the drug lead and rational design as the route to drug development. However, natural products are the result of a massively parallel experiment in combinatorial gene shuffling, mutagenesis, and screening for the generation of bioactive metabolites. And genomics and new technology (160) can promote the search for new natural products by increasing understanding of biodiversity and the factors that regulate microbial growth and expression, complementing the synthetic and semisynthetic routes to drug development.
Proteome analysis comprises three sequential steps: sample preparation, protein separation and mapping, and protein characterization. Sample preparation may entail cell fractionation and preliminary removal of more abundant proteins in order to detect those present in low concentration. Analysis is very dependent upon effective protein separation, and two-dimensional (2-D) gel electrophoresis (most usually immobilized pH gradient followed by molecular weight separation) is the present method of choice. Between 2,500 and 10,000 proteins are claimed to be resolvable on such gels (204
), while, in addition to determining protein inventories, the analyses can be made quantitative with respect to individual proteins (detection is possible at 1 ng with silver staining and at less than 1 pg with fluorescent dyes). It is important to note that posttranslational modifications (PTM) will significantly increase the number of separate proteins expressed from a genome and will not be revealed by genome annotation; the estimates are 1.2- to 1.3-fold for bacteria and 3-fold for eukaryotic microorganisms like S. cerevisiae
). Protein characterization is achieved by mass spectrometric amino acid sequencing and identity of PTMs, followed by interrogation of protein databases. In turn, this reverse genetics enables the identity of genes that are responsible for producing a particular protein expression profile (see below).
The usual approach to proteome analysis is first to produce a 2-D map of all the proteins expressed under so-called normal conditions in order to define the constitutive proteome of a cell or organism. Thereafter, qualitative and/or quantitative changes in the proteome can be charted as responses to different conditions (or reflections of different physiological states) induced by stress, growth environment, pathological state, and so on. Thus, reference maps and databases of identified and unidentified proteins are established.
In this new, fast-moving field, acceptance of an agreed terminology is crucial, and the recent proposals made by VanBogelen et al. (453) are very helpful in this respect. Protein expression profile is the quantitative catalogue of proteins synthesized by a cell or organism under defined circumstances; protein phenotype defines the character or state of a specific protein under defined conditions (e.g., quantity, rates of synthesis and turnover, and extent of PTM); a regulon is a set of proteins whose synthesis is regulated by the same regulatory protein; a stimulon is a set of proteins whose synthesis responds to a single stimulus; and the protein signature is a subset of proteins whose altered expression is characteristic of a response to a defined condition or genetic change—they may relate to specific metabolic pathways or cell functions. The last cannot be distinguished simply by comparing two protein expression profiles; rather, signatures are recognized only after reviewing numerous profiles obtained under similar or different conditions. Various signatures have been identified that are associated with microbial growth rate, ribosome function, and protein secretion (453). These authors conclude that phenotypes and signatures will develop as tools for addressing the functions of unknown proteins and for evaluating the mode of action of physical and chemical agents. Put another way, proteomics provides a very powerful means of revealing epigenetic effects, i.e., effects that involve multiple genes.
At present proteomics is being applied most actively in pharmaceutical research and development (16, 90) in two principal areas: drug discovery and target selection (e.g., via proteome difference analysis of pathogenic versus nonpathogenic organisms, normal versus dysfunctional states, and disruption of stress-induced protein synthesis) and drug mode of action, toxicological screening, and the monitoring of disease progression during clinical trials. The latter group of clinical features, which are directed at gaining a fuller understanding of pharmacological mechanisms of drugs, is driving the new field of pharmacoproteomics. On the one hand natural-product discovery and combinatorial synthesis can generate an enormous repertoire of candidate drugs; on the other hand the demonstration of their mode of action, efficacy, and safety is hugely demanding in resources and time. The advent of pharmacoproteomics is set to transform these aspects of pharmaceutical development.
Although to date proteomics has attracted the greatest interest from the pharmaceutical industry, its potential for application in other areas of biotechnology is being recognized. Moreover, the application of proteomics is not restricted to well-characterized—in terms of genome sequencing—groups of microorganisms. Exploration of the biochemistry and physiology of extremophilic and extremotolerant organisms by proteome analysis, for example, could reveal much that has relevance for biotechnology exploitation. Already proteome expression profiling has begun for some hyperthermophiles (164, 259), and other studies such as these open the way for discovering stable enzymes and other proteins. For example, the unusual group of tungstoenzymes are found largely, though not uniquely, in thermo- and hyperthermophilic microorganisms, and it has been suggested (267) that they have evolved to catalyze very low redox reactions at extreme temperatures, and these same organisms contain an unusually high abundance of chaperonins (241). Equally exciting opportunities may present from the discovery of proteome signatures in extremophiles as a means of detecting novel metabolism. In this context our interest is in the growth of marine bacteria under deep-sea conditions (high pressure, low temperature, medium to high salinities, and oligotrophic nutrient status) and applying proteomics to detect novel epigenetic phenomena of potential exploitability. One final illustration of the power of proteomics is in the area of decontamination and sanitization within the food, pharmaceutical, and other hygiene-sensitive bioindustries. Proteome analysis of stress responses is important here because it reveals global regulation of gene expression under different stress conditions. Thus, a recent analysis of the psychrotolerant food spoilage organism Pseudomonas fragi revealed overexpression of 91 stress proteins in response to challenge from cleaning-disinfection treatments in food plants (457). Such information is highly germane to the development of effective treatment procedures where organisms are known to counteract simultaneous adverse conditions by coordinated changes in gene expression.
The development and application of proteomics constitute a very recent field of technology. Present limitations and areas in need of improvement include the resolution and characterization of hydrophobic proteins which include major targets for pharmaceutical intervention (membrane enzymes and receptors) (90); quality of protein separation (368); ability to detect very low copy number proteins (226, 451); and improved throughput and automation (90, 226).
Biogeography is the branch of biology that deals with the geographic distribution of organisms and has developed almost exclusively with reference to animal and plant ecology. We speak of endemic species as those that are restricted to a particular geographic region and “hot spots” that are characterized by their high proportion of endemic species (342
). In contrast, species that have a worldwide distribution are termed cosmopolitan. Is biogeography of relevance in the microbial world? In their seminal article on the biogeography of sea ice bacteria, Staley and Gosink (422
) proffer three reasons why microbial geography is a critical topic for enquiry. Knowledge of biogeography will assist in (i) determining the extent of microbial diversity, (ii) identifying threatened microbial taxa, and (iii) identifying the ecological function of a particular species. We will add two other reasons, those of assisting search and discovery (knowing where to look) and helping to resolve the dilemma of how to conserve microbial gene pools (see later). However, the first question to address is whether biogeography applies to microorganisms.
Microbial ecologists have tended to accept somewhat uncritically the pronouncements of Beijerinck and Baas-Becking (see reference 422 for references) that bacteria (and by extension all microorganisms) are cosmopolitan: in Beijerinck's terms, “everything is everywhere,” to which Baas-Becking added “the environment selects.” A number of microbiologists are challenging this assertion of cosmopolitan geographic distribution. Tiedje (441) has questioned what genotypic level corresponds to everything—is it the species, as in the case of animals and plants, or the variety, or the DNA sequence? And what geographic scale corresponds to everywhere—a sand grain, soil aggregate, square meter, or catena? Questions such as these can now be addressed very critically using the range of molecular biology and high-resolution chemometric approaches that are available. We would argue that microbial biogeographic studies should be focused on the infraspecies genotypic level because of the intimate relationship between environmental/geographic factors and the speciation of microorganisms. Consequently we will adopt the term geovar (422) for a geographic variety of a microorganism that is endemic to a specific area or host. Moreover, definition at the varietal or infraspecific level is crucial in the context of biotechnology discovery because many sought-after properties are known to be strain as opposed to species determined.
In the remainder of this section, we examine the case supporting microbial endemism, while acknowledging that the cosmopolitan hypothesis has its strong adherents. In our opinion, application of rigorous analytical methods has paramount importance in coming to decisions on this issue. For example, solely on the basis of microscopic recording of cryptic ciliates in a freshwater lake and a shallow marine sediment, it was concluded that a substantial fraction of all known free-living ciliates were represented in these two habitat types and that such ciliates had a cosmopolitan distribution (130). From these observations, it was extrapolated that “in the case of microorganisms ‘everything is everywhere’.” Such a statement may be valid for the particular taxon studied and the limited regional/environmental range examined, but the restricted analytical approach (see below for discussion) presents difficulties for interpretation while we opine that the extrapolation to microorganisms in general is quite unjustified. Other protozoologists consider that many soil ciliate species show restricted geographic or ecological ranges (138), albeit the percentage of endemics is low. Data on infraspecific variation within ciliates are very sparse but presently indicate limited genetic diversity (44). In contrast, a study of the diversity of Vibrio anguillarum isolates employed a large battery of different typing methods, including ribotyping, serotyping, lipopolysaccharide profiling, plasmid typing, and biotyping (API, BIOLOG, and BioSys) (278). This study revealed a high genetic diversity within the species that correlated with geographic distribution and host species. The authors remarked that such relationships could be obtained only by analyzing a large number of isolates and deploying a multityping approach. Similar geographic distinction is known to occur within phytopathogenic organisms, one of the best documented being that of Ralstonia solanacearum infection of crops such as potato and banana. The most recent assessment of the genetic diversity within this bacterium has been made using PCR-restriction fragment length polymorphism analysis of the hrp (hypersensitive reaction and pathogenicity) gene region (374). The analysis confirmed separation of the species into two major groups, the Americanum and Asiaticum divisions, and revealed finer geographic distinctiveness, e.g., southern African (VII) and northern African (I and II) clusters and Reunion Island cluster (VIb).
An interesting case of restricted geographic range has been reported for bacteria capable of degrading the xenobiotic chemical 3-chlorobenzoate (3CBA) (150). 3-CBA degraders were isolated from soils in Australia, California, Canada, Chile, South Africa, and Russia by gross enrichment culture. Isolates were characterized on the basis of repetitive extragenic palindromic PCR genome fingerprinting and by ARDRA. All of the genotypes were referable to the Alcaligenes-Burkholderia group of β-Proteobacteria, and 91% of the genotypes were found to be unique to the geographic location from which they were isolated, and 98% of the ARDRA types were found only at one location. These data strongly indicate that 3CBA genotypes are endemic to the geographic regions examined. At a finer geographic scale, endemism has been claimed within natural communities of Achromatium oxaliferum (185); sediments from three freshwater sites in northern England contain genetically distinct populations of A. oxaliferum based on sequence analysis of PCR-amplified 16S rRNA genes; identical sequences were not recovered from the different sites. The sequence evidence for distinct populations has been corroborated by differences in nutritional and energy conservation characteristics (186).
Extremophiles might be expected to be salient organisms with which to test the endemic versus cosmopolitan hypothesis. Kristjansson et al. (276) have commented that it is not known to what extent geographically distinct extreme sites may differ and to what extent such sites harbor endemic and cosmopolitan taxa. These authors reiterate, however, that without robust and refined taxonomic databases, this question will not be resolved. Supporting evidence for the cosmopolitan distribution of thermophiles and hyperthermophiles has come from work on bacteria (e.g., Thermobrachium celere ), cyanobacteria (e.g., Microcoleus chthonoplastes ), and archaea. Stetter and his colleagues (425) used DNA-DNA pairing to show that Alaskan and European hyperthermophilic archaea were cosmopolitan, and such evidence is far more convincing than that produced from partial 16S rDNA sequences (154). However, the evidence for endemism among this group of extremophiles is particularly strong. There are examples of unique isolations of prokaryotes (e.g., Methanothermus sociabilis from Iceland ), and others where they are geographically restricted (e.g., Thermus aquaticus/USA and T. filiformis/New Zealand  and thermophilic fermentative anaerobes/New Zealand ). A final example, also from Stetter's laboratory (409), again demonstrates the value, and desirability, of using DNA-DNA pairing analysis in this type of research. The archaeon Thermoplasma volcanicum can be differentiated into three geographically distinct DNA groups restricted to Vulcano Island, Italy, to Indonesia, and Iceland together with Yellowstone.
The foregoing evidence regarding microbial biogeography is in large measure anecdotal. It is imperative, therefore, that a framework be established for determining whether or not an organism is endemic or cosmopolitan. The Staley-Gosink postulates (422) offer a major stimulus for conducting further research in this field. Fulfillment of the following postulates would be necessary to categorize an organism as cosmopolitan: (i) at least four strains of the organism should be isolated from different samples of the ecosystem under consideration; (ii) the strains must be demonstrably indigenous to the ecosystem or host; (iii) at least four strains of a putatively identical organism must be recovered from one or more geographic locations from which the first strains were obtained; and (iv) the two or more groups of strains from such separate geographic locations must be subjected to phylogenetic analyses by sequencing two or more appropriate genes. If the strains show no evidence of forming clades, they can be considered cosmopolitan; otherwise they can be designated endemics or geovars. Staley and Gosink (422) also proposed a fifth but optional postulate in order to establish species identity of putative geovars. Polyphasic taxonomic analysis, in which DNA-DNA pairing is de rigueur, must be employed in such a test: thus, if two or more groups of strains show geographic clustering and fulfill the criteria for being different species, they should be named and described as separate, endemic species.
Research on the sea ice microbial community is yielding further strong evidence for microbial endemism and has been the subject of an excellent review by Jim Staley and John Gosink (422). Here we highlight a few features of this work that are especially germane to our overall critique of microbial biogeography. Sea ice covers at least 7% of the earth's surface, provides a range of microenvironments, and sustains a diverse microbial community. Among the sea ice bacteria, for example, are some of the most psychrophilic organisms so far described. The attention of research groups in Australia and the United States on sea ice communities in recent years has led to many new bacterial genera and species descriptions: Polaromonas (233), Gelidobacter and Psychroserpens (46), Octadecobacter (181), Colwellia spp. (45), Polaribacter (182), Psychroflexus (47), and “Iceobacter” (422). Strains of Octadecobacter, Polaribacter, and “Iceobacter” were isolated from both Arctic and Antarctic sea ice, and species identities for Octadecobacter and Polaribacter have been verified by DNA-DNA pairing. The data indicate that none of the species had a bipolar distribution. The strains of “Iceobacter” have not been circumscribed by DNA-DNA pairing, but on the basis of major phenotypic differences, distinct north and south polar species that again lack a bipolar distribution have been proposed. Nevertheless, the authors prudently advise that “Not finding cosmopolitan (sea ice) species does not mean that they do not exist.” In this context, it will be interesting to test the recently described Antarcticobacter heliothermus gen. nov., sp. nov. (282) for bipolar distribution. It remains but to emphasize that 16S rDNA sequences are too highly conserved to permit rigorous detection of endemic microbial taxa and that other phylogenetic markers and high-resolution discriminatory procedures need to be applied to such questions.